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‘STUDENT’ AND SMALL SAMPLE THEORY 


B. L. Weicu 
University of Leeds, England 


This year marks the Fiftieth Anniversary of the publication of 
“Student’s” distribution. It is an appropriate time to reconsider the im- 
pact of this part of “Student’s” work on the development of statistics. 


1, INTRODUCTION 


HEN he died in 1937 W. S. Gosset occupied an enviable position among 
\\ statisticians. He was universally respected for the originality of his sta- 
tistical work and for the attractive way in which he presented it. The contribu- 
tions which are now best remembered when we allude to ‘Student’—using the 
pseudonym under which he wrote—are the early papers on theoretical topics. 
He was, however, equally admired for the nice sense of proportion which 
governed all his statistical reasoning. This sense was evident in the valuable 
suggestions which he made concerning the conduct of experiments and surveys, 
looking ahead always to an eventual statistical analysis which would be both 
simple and informative. The many-sided nature of the man is apparent to any- 
one who glances, however casually, through the volume of his collected papers 
[14] and was brought out clearly by a number of writers when he died (c.f. 
particularly Pearson [10]). It is difficult to add much to this general picture 
but I intend to refer to certain aspects which are made topical by the fact that 
just fifty years have elapsed since the publication of the two well-known papers 
(i) “The probable error of a mean” and (ii) “The probable error of a correlation 
coefficient.” I propose to discuss these two 1908 papers but, in doing so, I shall 
take for granted some familiarity with their main contents and try to see them 
against the background of the contributions made by other authors to related 
problems. This will necessitate some brief description of inverse probability 
arguments although, as we shall see, Gosset’s work was ultimately to strengthen 
the reaction against this approach to statistical inference. 


2. THE THEORY OF ERRORS 


A large number of books on the reduction of observations were written in 
the later decades of the nineteenth century, most of them aiming to illustrate 
and thus make more accessible the very general computational methods pub- 
lished by Gauss in 1821-6 [7]. Their object was to show how to obtain from 
scientific observations estimates of physical quantities together with indications 
of their reliability. It had become usual to express precision in terms of “proba- 
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ble errors” and most authors made at least some brief attempt to say how a 
“probable error” was to be interpreted in terms of probability theory. A typical 
statement, for instance, is the following from an American text by W. W. 
Johnson [9, p. 50]. 
“The probable error of a final result is frequently written after it with the sign +. 
Thus, if the final determination of an angle is given as 36° 42’ .3+1’ .22, the meaning 
is that the true value of the angle is exactly as like! to lie between the limits thus 
assigned (that is, between 36° 41’ .08 and 36° 43’ .52) as it is to lie outside of these 
limits.” 

In thus asserting the equal likelihood that a “true” value will be contained 
within or excluded from the assigned range, writers on the theory of errors 
almost invariably had in mind a hypothetical long run of repetitions, consisting 
not only of the results one might obtain by repeating the measurements on the 
same true quantity but also by measuring other true quantities of a similar 
nature. Out of this global set of hypothetical repetitions one might then, in 
theory, construct the sub-set for which the measurements are identical with 
those actually realized in the investigation under review. The assertion then 
made was that, in this sub-set, on 50 per cent of occasions the true quantity 
being measured would lie in the range calculated with the aid of the probable 
error formula. 

That the accuracy of such probability statements depends among other 
things on prior assumptions about the distribution of the true values being 
presented for measurement was of course well known; it was also realized that, 
if the number of measurements made is small, as it is very apt to be in practice, 
a change in the prior assumption can seriously alter the probability which 
should be associated with the calculated limits. Since, however, this does not 
seem to have been regarded as a matter of great importance, one must conclude 
that contemporary scientific users of the method of least squares were as a 
rule content with it simply as a very convenient method of estimation. They 
were happy to have standard errors (or probable errors) as indicating, in a 
broad comparative way, the merits of the estimates obtained, but the exact 
expression of probability, derived from an application of the Bayes-Laplace 
method, must to the majority of them have been of secondary importance. 

The position of the application of inverse probability theory has not, in the 
opinion of the present writer, changed much down to the present day, despite 
the illuminating attempts by Jeffreys [8] to put the choice of prior distribution 
functions on a rational basis and to remove the whole theory from the context 
of a frequency interpretation of probability. 


3. SMALL SAMPLE THEORY OF THE MEAN 


In the presence of this overriding uncertainty about prior distribution func- 
tions of true values, writers approaching the subject from this viewpoint could 
comfortably overlook several other difficulties. In particular it was customary 
to derive a standard error of a mean by using the observed minimized sum of 
squares, but in calculating probable error therefrom the random fluctuation of 
this quantity was ignored. For in fact if the data were sufficiently sparse for this 
fluctuation to matter, the assumed prior distribution for the true values would 
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at the same time become of critical importance. This was realized by no one bet- 
ter than F. Y. Edgeworth, who nevertheless did in 1883 still consider it worth 
while to develop the small sample theory of a mean value to some extent [2]. 
In describing what he had to say in this context I shall translate his remarks 
from the language of the theory of errors to that of present-day statistics, but 
trust that otherwise I shall not alter the sense. 

Suppose x1, 2, - - - Z, are independent Gaussian variables with expectation 
» and standard deviation o. Denoting them collectively by S, we may write 
their distribution, for given u and oa, as 


f(S| nu, o)dS = (2r)-"/2o— exp {-2-16? >> (x — u)?} day -++da_. (1) 


Suppose also that we are given a prior distribution g(u, o) du de, for u and o. 
Then the joint distribution of sample and parameters is 


h(S, u, c)dSdudo = f(S| u, o)g(u, ¢)dSdude. (2) 
The posterior distribution of » and ¢, given a realized S, is therefore 
p(u, «| S)dudo « h(S, u, «dude (3) 


(the constant of proportionality being obtained by integrating out over u and a). 
The posterior distribution of » alone is then 


p(u| S)du -f 


¢ 


P(u, «| S)dude. (4) 


For the prior distribution, g(u, 7) du do, Edgeworth assumed the form Co—*dude 
which follows by taking » and o to be independent and the precision constant 
h=(2-'*e-) to have a uniform distribution. Making the necessary substitu- 
tions [2, p. 367] he arrived at the equation 


p(u| S)du = K{1 + n(& — u)*/ D> (w — #)*}-@ Ady, (5) 

On writing t= /n(n—1)(@—2)/[ >o(e@—2)?]"”, this yields 
p(t| S)dt « {1 + #2/(n — 1) }-@+/2q¢, (6) 

If n is large we may expand (6) in powers of (n— 1) to give 
p(t)dt « exp { —é#/2 + (t — 4f*)/4(n — 1) + ete.} dt. (7) 


Edgeworth termed (5) a sub-exponential distribution and noted that the factor 
needed to give the “probable error” now differs from the standard Gaussian 
multiple, although, as equation (7) shows, with large enough n there is no dif- 
ference. Since one has in practice to deal with small groups of observations it 
might appear that one should attach great importance to equation (5). Edge- 
worth never did so, however, because he realized that a change in the assumed 
form of g(u, «) would have decisive influence. In equation (7) the corrective 
term (t*—4@)/4(n—1) to the large sample result would have to be replaced by 
something else if g(u, 7) were altered. Unless, therefore, we possess, as Edge- 
worth did not, some unequivocal method of deciding upon g(u, c), we are not 
much further forward and the use of the Gaussian multiple, as generally prac- 
tised, could scarcely be subjected to severe criticism. 
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When we come to consider Gosset’s contribution tov the problem of the mean 
we shall find him arriving by a different route, not at the same expression as (5), 
but at an expression of similar form, but, even so, with a very different interpre- 
tation. The present paper of Edgeworth was not, however, known to Gosset— 
it has indeed been largely overlooked by statisticians—and we can only specu- 
late what his reaction might have been had he seen it. 


4. THE CORRELATION COEFFICIENT 


At this point it will be convenient also to describe in a very formal way the 
general inverse probability approach to the problem of the correlation co- 
efficient. 

Suppose (x;, y;) are now n pairs of random variables, independent as between 
pairs, but each pair following the normal bivariate distribution with means 
Mz, My, Variances o,’, o,? and covariance po.c,. Again denoting the whole sample 
by S, we have therefore as its distribution for given values of the population 
parameters: 


f(S| Mary My, Fx, Ty, p)dS 
= exp| {= — Hs)* — 2p(ts — Hs)(Ys — by) 


"21 = *) 








Cz T720y 
(Yi — By)” ” 
+——* | X fosoyV1 — p*}—*(20)-*daidy: - + » drndyn. 


oc,” 


y 
Then, if the prior distribution function of the parameters is g(us, py, oz, oy, 


p)du,z dude, de, dp, the joint distribution of sample and parameters is given by 
h(S, Bz, By, Tx, Ty, p) = f (S | Mz, By, Tx, Ty, p)g (uz, My, Tx, Ty, p) (9) 


where the differential elements dS du, dy, do, do, dp must be appended to each 
side of the equation. The posterior distribution of the parameters given S is 
proportional to h(S,uz,4y,¢2,c,,e) and hence the posterior distribution of p 
alone is 


p(e| S)dp « f f f Jus, Hay My, Fx; Fy, P)dpdudododp. (10) 


The result of this analysis for the particular case where g(uz,uy,02,0y,p) is 
taken to be uniform and where n is large was given in 1898 by K. Pearson and 
L. N. G. Filon [11]. If r is the sample correlation coefficient the posterior dis- 
tribution of p is then normal with mean r and standard deviation (1—r*)n—, 
(Pearson and Filon indeed gave the joint posterior distribution of all the 
parameters but this need not concern us for the moment.) Within limits we 
can alter g(u2,4y,¢2,cy,p) considerably and still obtain the same large sample 
result. If n is not large enough, however, the choice of the form of g(u2,uy,02,0,,p) 
will be critical and all the familiar objections to the method will begin to carry 
weight. 


5. GOSSET’S DISCUSSION OF THE CORRELATION COEFFICIENT 


In his 1908 paper on the correlation coefficient [12], Gosset mentions two 
typical questions. (i) He introduces the subject by referring to the problem of 
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judging whether an observed r is consistent with an assumed p (in his case 
p=0), but also (ii) he states that “we require the probability that p for the 
population from which the sample is drawn shall lie between any given limits.” 
(Gosset actually uses R for the population correlation coefficient but I have 
taken the liberty of changing his RF to p in the present quotations to conform 
with modern usage. The important point is that Gosset did use different sym- 
bols for population and sample statistics). He continues [12, p. 302], 

“It is clear that in order to solve this problem we must know two things: (1) the 
distribution of values of r derived from samples of a population which has a given p, 
and (2) the a priori probability that p for the population lies between any given 
limits. Now (2) can hardly ever be known, so that some arbitrary assumption must 
in general be made; when we know (1) it will be time enough to discuss what will be 
the best assumption to make, but meanwhile I may suggest two more or less obvious 
distributions. The first is that any value is equally likely between +1 and —1, and 
the second that the probability that z is the value is proportional to 1—z*: this I 
think is more in accordance with ordinary experience: the distribution of a priori 
probability would then be expressed by the equation y = }(1—2*). 

But whatever assumption be made, it will be necessary to know (1), so that the 
solution really turns on the distribution of r for samples drawn from the same popu- 
lation.” 


Although he does not produce the sought solution in final mathematical form 
Gosset, by a mixture of empirical and theoretical reasoning which has often 
been admired, succeeds in telling us almost as much about the distribution of 
r as any symbolic expression could convey. However, since he could not write 
down in a short convenient way the expression for f(r| p)dr, he was unable to 
take the further step envisaged in the above quotation. For to complete his 
solution, given a prior probability distribution g(p)dp, he would have had to 
write down the joint distribution 


h(r, p)drdp = f(r| p)g(o)drdp (11) 
and then the posterior distribution of p given r 
p(p| r)dp « h(r, p)dp (12) 


(the factor of proportionality being obtained by integrating out with respect 
to p). From (12), for given observed r, he could then have calculated the chance 
that p lies between any limits that might have been prescribed beforehand and 
thus have solved his second problem. 

The solution of equation (12) is not, however, necessarily the same as that 
of equation (10) of our previous section. In (12) we are assuming, in calculating 
the posterior distribution of p, that r is the only feature of the sample that need 
be considered whereas formally (10) implies that we consider all the sample 
values, although in virtue of (8) the quantities needed reduce immediately to 
(2, 9, 82, 8, and r). To investigate this further let us write 


S(S| wey Hy, Oe, Cv P) = S(r| p)f(S| 7, wey by, G2) Oy, 2) (13) 


and 


G(s) by» Tx, Ty, P) = lp) gus, My Fx, Fy| p). (14) 
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We shall then have from (10) and (11) 
p(p| S) « A(r, p) 


; f q He f q f J(S| 1, wey By, C2) Fy, P)G(He, My, 2, Cy | p)dusduydo.doy 


(15) 


If the result of the integrations in (15) does not depend on p then P(p| S) is 
the same for all S leading to a given r and therefore (10) and (12) will give the 
same posterior distribution of p. This clearly will not always be the case irre- 
spective of the form of g(us, uy, oz, ty|), but it is not difficult to make a choice 
of g(u2Hyo20,| p) for which it will be the case (e.g. Jeffreys, [8], p. 152). 

I am not concerned to pursue the matter beyond this point at the moment 
for all these assignations of prior probability have an uncomfortable air of con- 
trivance about them and in the present situation there are other obvious 
grounds why r, alone, should enter into the picture. For, starting with the set 
of quantities (Z, 9, s:, 8, and r), there is no other function of them which has a 
direct distribution depending upon p but not depending upon the nuisance 
parameters y., uy, o- and o, in addition to p. Being, as we are in the situation 
which Gosset has in mind, completely ignorant about the values the nuisance 
parameters are likely to possess, any rule for making inferences about p which 
can be expressed in exact probability terms must therefore be based on the dis- 
tribution of r alone among the sample quantities available. If there did not 
exist a quantity like r depending in its distribution only on the parameter p 
at issue or, if we were concerned rather with probability statements which were 
to be expressed in terms of inequalities the position might conceivably be 
changed, but, as it is, we are fortunate that we can here, as Gosset does, sim- 
plify at the outset and consider a single statistic r alone. 

Even in Gosset’s treatment there still remains the question of the prior 
distribution of p. He would have been forced to give more consideration to this 
if he had actually solved his main problem and found f(r| ). However, as the 
above quotation shows, his assumed forms for g(p) are put forward only very 
tentatively and he might easily have decided to dispense with them on further 
consideration and have tried what he could do without any assumed prior 
knowledge at all. 


6. GOSSET’S DISCUSSION OF THE MEAN 


Although there are parts of Gosset’s paper on the mean [13] which, as with 
his treatment of the correlation coefficient, suggest an outlook based ultimately 
on inverse probability, there is nowhere explicit reference to the prior functions 
which are an indispensable item in the practical working out of such an ap- 
proach and in places we see indeed a very different outlook taking shape. The 
major part of the paper is concerned with an investigation of the direct proba- 
bility distributions of the quantities 


s? = > (x — £)*/n and z= (# — y)/s (16) 


(We shall maintain Gosset’s definition of s although most of us would use a 
divisor (»—1) and prefer to discuss, instead of z, the quantity 
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t= Jn(n — 1) — »)/[L @ — 4)!" = Vin = Te (17) 


which tends to have unit standard deviation as n becomes large). 
On this occasion Gosset succeeded in writing down the direct distributions 
sought and, in particular, he found 


p(z)dze « (1 + 2*)-"/*dz (18) 


As R. A. Fisher ([4], p. 81) has noted, Gosset’s discussion of the particular 
case n=2 is specially interesting. In this case we have the simple expressions 


s= |a—a|/2 and z= (xm +22 — 2y)/|n — | (19) 


and 
p(z)dz « (1 + z*)—'dz. (20) 


The chance that z lies between any two values z, and is 
| tan-! zp — tan-! 2 | /x 
and in particular the chance is } that z lies between —1 and +1. Symbolically 
Pr{—1 < (m1 + a2 — 2u)/|m—m| <1} =}. (21) 
Thence we may deduce that 


Pr{ (a: + a2) — | a — 2| < Qu < (um +2) + |r — al} =4 (22) 
i.e. Pr- { u lies between x, and x2} == §. (23) 


In Gosset’s own words ([13], p. 13), where he is first broaching the question 
of tabulating (18), this deduction is expressed thus: 


“The table for n =2 can be readily constructed by looking out @=tan™ z in Cham- 
bers's tables and then 0.5+6/2 gives the corresponding value. 

Similarly } sin @+0.5 gives the values when n=3. 

There are two points of interest in the n =2 curve. Here s is equal to half the dis- 
tance between the two observations. tan~ s/s =2/4, so that between +s and —s 
lies 2Xx/4X1/z or half the probability, i.e. if two observations have been made and 
we have no other information, it is an even chance that the mean of the (normal) 
population will lie between them. On the other hand the second moment coefficient is 

1 T/2 1 7/2 
—_ tant edo = —| tan — 0 | = ©, 


J —r/2 —7/2 


or the standard deviation is infinite while the probable error is finite.” 


Later on, following the short table of the probability integral of z which he 
provides for n = 4(1)10, Gosset again gives expression to a similar interpretation 


([13], p. 20): 


“The tables give the probability that the value of the mean, measured frem the 
mean of the population, in terms of the standard deviation of the sample, will lie 
between — © and z. Thus, to take the tables for samples of 6, the probability of the 
mean of the population lying between — © and once the standard deviation of the 
sample is 0.9622, or the odds are about 24 to 1 that the mean of the population lies 
between these limits. 

The probability is therefore 0.0378 that it is greater than once the standard 
deviation and 0.0756 that it lies outside +1.0 times the standard deviation.” 
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In other words the table provides first the information that, for n=6, 
Pr{z = (@ — »)/s < 1} = 0.9622 (24) 


and then again we may make the transition to an equivalent statement of 
which uz is the subject, viz. 


Pr(u > # —- 8) = 0.9622 (25) 
and by symmetry this implies 
Pr(u < 2+ 8) = 0.9622=24/25. (26) 


Furthermore, although » has become the subject of statements (23) and (26) 
the probability is still a direct one related to hypothetical repeated sampling 
from a population with fixed mean » and standard deviation ¢. (The two above 
quotations are indeed separated in the text by the description of a sampling 
experiment where, among other things, the theoretical expression (18) is tested 
out on empirical material consisting of 750 samples of 4 from a given popula- 
tion). The status of the concept of probability is not changed by the mere 
alterations of emphasis which Gosset makes as he proceeds in these passages 
from one sentence to the next, for what he is saying at this point is deduced all 
the time from the direct distribution of z without the intervention of any 
further principles. 


7. APPLICATION TO SPECIFIC EXAMPLES 


Although Gosset was concerned with direct probabilities in the part of his 
paper to which allusion has just been made, as soon as he began to apply his 
results to specific examples he used language that to readers at that time might 
easily have suggested that a posterior probability interpretation in the Bayes- 
Laplace sense was intended. For instance, one of his sets of data relates to the 
additional hours of sleep obtained by 10 patients when given a certain hypnotic 
drug (Treatment 1) compared with the sleep obtained without hypnotic. The 
individual gains were 0.7, —1.6, —0.2, —1.2, —0.1, 3.4, 3.7, 0.8, 0.0, and 2.0. 
These have mean #=0.75 and standard deviation s (using his definition) = 1.70. 
Of these figures Gosset writes (p. 20), 

“First lei us see what is the probability that 1 will on the average give increase of 
sleep; i.e. what is the chance that the mean of the population of which these experi- 
ments are a sample is positive. +0.75/1.70 =0.44, and looking out z=0.44 in the 
table for ten experiments we find by interpolating between 0.8697 and 0.9161 that 
0.44 corresponds to 0.8873, or the odds are 0.887 to 0.112 that the mean is positive.” 


This is elliptic. All that Gosset’s developed theory, supported by his tabula- 
tion, had shown was that 


Pr{z = (& — u)/s < 0.44} = 0.887. (27) 
On making the kind of transition described in the previous section this becomes 
Pr(u > @ — 0.448) = 0.887. (28) 


Now Gosset’s statement that the chance is 0.887 that u is positive can be ob- 
tained from (28) by substituting for # the realized value 0.75 and for s the 
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realized value 1.70. Thus »>0O can be regarded as a realized value of the 
random inequality 4>(%—0.44s), which, in repeated samples, has a chance 
0.887 of being satisfied. One may object that the random quantity (#—0.44s) 
has been defined with the assistance of the figure 0.44 which is itself derived 
from the realized sample values, but this is a point which I shall not enter into, 
for it is in any case not certain that an interpretation on the present lines was 
what Gosset actually intended, despite some pointers in this direction from 
earlier sections. However, whatever he had in mind, there is no doubt that, 
by many readers, a stated chance that 1.>0 would automatically be regarded 
as a posterior probability such as might be deduced if some prior distribution 
of » and o were available. It is perfectly true that Gosset mentions no such 
prior function in the present context and therefore strictly should not be sus- 
pected here of using the classical inverse probability argument. It is also true, 
however, that particularly at the time he was writing, users of the Bayes- 
Laplace method often introduced prior distributions almost by sleight of hand 
and he would not have been out of fashion if he had been doing something 
similar. This practice was relatively innocuous when large samples were avail- 
able, but in Gosset’s work, which was avowedly designed to deal with very 
smal! samples, a tacit and unreasoned adoption of a particular prior distribu- 
tion function could have been fatal to his purpose. Possibly to prevent any 
chance of misunderstanding, in later papers he abandoned the present kind of 
statement, as far as I am aware altogether, and gave his conclusions in the form 
of a direct summary of the type:—if 1» =0 then a value z=Z/s less than that ob- 
served will occur with probability 0.887 and a greater value with probability 
0.113, i.e. not sufficiently rarely to throw doubt upon the hypothesis that u=0. 
Whether such statements, impeccable as they are as deductions from the initial 
assumptions, are in fact ever in themselves sufficient for action is arguable. 
But at least they have the merit of being easily understood. 


8. THE NATURE OF GOSSET’S ACHIEVEMENT 


I have thus far chosen to emphasize the position of these papers of Gosset in 
the context of the views of statistical inference current at his time. I am, how- 
ever, far from wishing to imply that he himself was much concerned with any 
theory of inference, suggestive as some of his remarks may have been on this 
score. He was primarily interested to find the distribution of r and z in direct 
sampling from normal populations and the occasional references he makes to 
the problem of inference are, perhaps, no more than an acknowledgment of its 
existence. It is then as an extension of our knowledge of direct sampling distri- 
butions that he would have wished the 1908 papers to be assessed. If, as it 
happens, I have said very little above about the actual derivations, it is only 
because the facts are so well known. There may be readers who are not greatly 
impressed by the papers on account of the incompleteness of the mathematical 
proofs which are given, but the final verdict of mathematical statisticians will, 
I believe, be that they have lasting value. They have the rare quality of showing 
us how an exceptional man was able to make mathematical progress without 
paying too much regard to the rules. He fortified what he knew with some 
tentative guessing, but this was backed by subsequent careful testing of his 
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results. In this he exemplifies an attitude more common, perhaps, to mathe- 
matical innovators than they care sometimes to admit. We have become ac- 
customed to-day to a standard of published mathematical proof which can 
hide rather than reveal the actual process by which discoveries are made. With 
Gosset on the other hand, we can almost observe his initial thinking, whilst the 
nature of the final proof is secondary provided only it is sufficient to convince 
us that the results are right. 


9. SUBSEQUENT DEVELOPMENTS 


The successful generalization of the ‘Student’ distribution which forms the 
basis of so much statistical work in the modern period, was, as is well known, 
provided by R. A. Fisher. If I may venture to express a particular preference 
among his papers, it is for one which was published in 1925 [3] in which the 
whole theory is very succinctly developed. Fisher showed that it applied to the 
most general situation in “least squares” where observations are interpreted as 
being equal to linear functions of parameters plus random normal errors whose 
sampling variances are proportional to known numbers (but where the actual 
scale of residual variance has to be estimated from the minimized sum of 
squares). Also about this time Fisher published the book [5] which was to 
become a classic and which exploits the “general linear hypothesis” in a variety 
of experimental situations very different from the typical ones encountered in 
physics and astronomy. In this field of application Gosset also had made a 
great deal of the running but the conduct of the general advance now lay in 
Fisher’s hands, and the impetus which he then gave to the subject is far from 
being exhausted. 

Also should be mentioned further work on the correlation coefficient. As we 
noted above, Gosset did not succeed im discovering an explicit form for the 
distribution of r in normal samples. Fisher in 1915 [6] was, however, successful, 
although the complexity of the result was such that it was not surprising that 
Gosset’s unorthodox approach had failed to reveal it. Later Fisher was to 
provide also simplifying approximations to the distribution and to make the 
generalization to partial correlations and, in effect, to write yet another 
chapter in the history of the development of statistical methods. 


10. EFFECTS OF NON-NORMALITY 


Anyone who works in this field of normal small sample theory must reflect 
at some stage on the importance or otherwise of the “assumption” of normality 
in the populations sampled. The reasons which have been put forward from 
time to time for making this assumption are not wholly convincing but are 
worthy of some notice: 

(i) It is said that many empirical populations are in fact Gaussian. We can, 
I think, accept that to a good approximation this is so, or becomes so, by some 
simple transformation of the variables. Nevertheless the onus would always 
seem to be on the experimenter to produce positive evidence that the Gaussian 
assumption is in a general way applicable in the particular field in which he is 
operating. 

(ii) Gosset expressed the opinion that there might be fields of inquiry where 
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skewed populations are expected but where the direction of skewness would 
not be known beforehand. He seemed to envisage the possibility of a distribu- 
tion of skewness (equally likely positive and negative)—whether in an actual 
superpopulation or just in some logical sense is not clear—and by this means 
the normal theory distribution of z might be maintained (c.f. Letter quoted by 
E. S. Pearson, [10], p. 245). This, as it stands, is too vague to be of much 
assistance. 

(iii) We often wish to draw deductions of a symmetrical kind (e.g. confidence 
limits equally spaced about the mean) and moderate skewness in the popula- 
tion does not affect such statements seriously. This is true but almost as often 
we wish to make “one-tailed” statements which are affected by skewness. 

(iv) There is as a rule no other simpler mathematical assumption than the 
Gaussian in better accordance with the empirical facts for which at the same 
time the sampling theory has been worked out in such complete form. Here, 
perhaps, we are coming close to the real reason why normal theory holds the 
position it does, but it is not a reason which is convincing to a person who 
questions the necessity of using any small sample theory at all. 

In trying to give weight to these pros and cons it may be helpful to recall just 
exactly how the presence of skewness in a population does influence the dis- 
tribution of the quantity t=./n(#—1)/s, using now our standard definition. 
We may note firstly that, if n is very large, ¢ tends to be normally distributed 
irrespective of the form of the population sampled (excepting some extreme 
eases). If n is only moderately large, however, M. 8. Bartlett [1] has shown 
that the distribution of ¢ differs from the standard normal theory ¢ distribution 
by an amount of order n-/?. But the normal theory ¢ itself differs from the unit 
Gaussian distribution only by an amount of order n-. If, therefore, we decide 
to ignore the influence of skewness on the ¢ distribution we might well go further 
and act as if ¢ were unit Gaussian. If this position were accepted we would, of 
course, be returning to the use of the ubiquitous figure 0.67449 for determining 
probable error from an estimated standard error, despite the fact that the 
latter may be based on only a moderate number of degrees of freedom. The 
normal theory ¢-multiple will undoubtedly constitute a refinement if we are 
actually sampling from a Gaussian population, but otherwise, it is difficult to 
see how we can press its use upon recalcitrant statisticians who say that they 
have no confidence that their data are Gaussian and that therefore, for sim- 
olicity, they are content to use with small samples the multiples which they 
know are at least valid with large ones. We may reply that there is little chance 
of making things much worse by using a normal theory ¢-multiple rather than 
the unit Gaussian multiple, but we can give no positive assurance that there 
will be gain. 


11. CONCLUSION 


The expression of inferences from sample to population means in terms of 
probability has never been free from an admixture of arbitrary elements, e.g. 
(i) the nature of the law of “facility of error” (or the form to be assumed for 
the population distribution), and (ii) the nature of prior functions in the Bayes- 
Laplace method. It was early shown that the effects of this arbitrariness dis- 
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appeared if samples were large enough, but the development of a specific small 
sample theory was still inhibited. In the modern period, often dated from 1908, 
we have seen a gradual abandonment of inverse probability arguments and 
attempts to confine conclusions to those which may be deduced from direct dis- 
tributional facts. Whether we believe, with some, that inverse probability has 
finally been scotched or, with others, that a stroke of inverse probability will 
always be required at some point, we must note that a large part of the de- 
velopment of the normal small sample theory, at least in the twenty years 
following 1908 when the immediate influence of Gosset was being felt, was 
rendered possible by the removal from the argument of the arbitrariness 
associated with the postulation of particular prior distribution functions of 
parameters. Without prejudging the success or otherwise of these developments 
as providing a sufficient basis for probability inference, and without attempting 
to evaluate what has been written on inference since 1928, we can still un- 
reservedly commemorate in Gosset a man who played an outstanding part in 
contributing to our understanding of these questions. 

The source of arbitrariness associated with the assumption of normality in 
the population remains, however, whatever our general views on inference 
may be. The standard ‘Student’ theory is an unqualified improvement on large 
sample theory only if the populations sampled are close to the Gaussian form. 
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ON GROUPING FOR MAXIMUM HOMOGENEITY* 


Water D. Fisuer 
Kansas State College 


Given a set of arbitrary numbers, what is a practical procedure for 
grouping them so that the variance within groups is minimized? An 
answer to this question, including a description of an automatic com- 
puter program, is given for problems up to the size where 200 numbers 
are to be placed in 10 groups. Two basic types of problem are discussed 
and illustrated. 


1, INTRODUCTION 


ore irra are often interested in defining homogeneous groups. Measures 
of precision of point estimates partly depend on homogeneity within strata 
from which samples are taken. Tests of significant differences are based on 
comparisons that also involve such homogeneity within strata, as well as dif- 
ferences between them. Apart from sampling or inference problems, it is often 
important to know how a population may be decomposed into sub-groups that 
contrast sharply with each other, individuals of the same group being fairly 
alike. 

This paper deals with the following problem from the viewpoint of statistical 
description: given a set of K elements, each element having assigned to it a 
weight, w,;, and a numerical measure, a;, and given a positive integer G that is 
less than K; to find a systematic and practical procedure for grouping the K 
elements into G mutually exclusive and exhaustive subsets such that the 
weighted sum of squares 


K 
D= 2. wi(a; — 4;)? (1) 


is minimized, where 4; denotes the weighted arithmetic mean of those a’s that 
are assigned to the subset to which element 7 is assigned. This problem will be 
called a grouping problem. The D value, well known as the sum of squares 
within groups in the sense of the analysis of variance, will here be called 
squared distance. A system of grouping is often called a partition, and a partition 
associated with the minimum squared distance D will be called a least squares 
partition. 

Two subclasses of the grouping problem are distinguished: (1) the unre- 
stricted problem, where no restrictions or side conditions are imposed on the 
partitions allowed; and (2) the restricted problem, where such conditions are im- 
posed a priori on the basis of previous knowledge, theory, or for convenience. 
The relevance of each type of problem will be illustrated by an example from 





* This paper is the result of work supported by the Social Science Research Council and by the Bureau of 
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1 An equivalent geometric problem is: given K weighted points on a straight line, to group the points into G 
groups so that sum of squared distances of the individual points from their group centers of gravity is minimised. 
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the literature, and computational methods for handling the simpler types of 
problem will be presented. 

An analogous problem for the case of a continuous frequency distribution has 
been investigated by Dalenius [3], [5] and Delenius and Gurney [4]. The 
special case of the normal distribution has been considered in a recent note by 
Cox [2]. The methods suggested by these writers ere useful in the discrete prob- 
lem considered here when the number of individuals is large, when their dis- 
tribution can be approximated by a fairly simple continuous curve, when the 
number of groups G is fairly small—say five or less—and when no side condi- 
tions are put on the admissible partitions. Otherwise, the approach of the 
present paper is believed to be preferred. 

The position taken here that the w; and a; are given and known with complete 
certainty entails a descriptive or non-stochastic approach; yet this approach 
leads to sampling and other stochastic applications. It is moreover assumed 
without attempt at justification here that the measure of homogeneity used, 
D, formed by adding squared deviations, is useful and relevant to many prac- 
tical problems.’ 


2. THE UNRESTRICTED PROBLEM 


The unrestricted problem may be easily understood by considering a familiar 
situation. Assume that it is desired to find the best method of choosing a given 
number of strata for proportional-stratified sampling when information is 
available regarding the relevant variable in the population.’ In their discussion 
of stratified sampling Hansen, Hurwitz and Madow [8] present a problem with 
data on income levels of Atlanta families, based on a previous study by Men- 
dershausen [9]. A frequency distribution of these families, grouped into ten 
income classes, is shown in Fig. 791. The problem is to combine the ten classes 
into three larger strata so that the estimate of mean income for all families, 
based on a stratified sample, has a small variance.‘ Various strata and various 
methods of sampling are suggested. Here attention will be confined to the 
various possible combinations of the original classes into strata, assuming 
proportional allocation of sample numbers between the three strata and ran- 
dom sampling within each stratum. It is also assumed that the sample mean is 
taken as the estimate of the overall population mean. It has been shown, and 
is well known, that under these conditions, if w; denotes the weight of income 
class 7 in the population, a; denotes the mean income of income class 7, and 4; 
denotes the mean income of the stratum to which income class 7 is assigned, 
then the variance of the estimate is proportional to D as given by equation (1) 
plus a constant representing the variance within the original classes. To mini- 
mize the variance of the estimate it is sufficient to minimize D, which repre- 





? Savage has given a general theoretical argument in support of the squared error criterion for statistical de- 
cision and estimation problems [10, Ch. 15]. In a previous article [6] the present writer deduced the squared error 
criterion from a specific economic decision problem, in which uncertainty was also introduced. 

3 Strictly speaking, the problem to be discussed is based upon the assumption that the stratification variable is 
identical with the variable to be estimated. The stratification method is useful, without this assumption, when an 
@ priori stratification variable can be found that is highly correlated with the variable to be estimated. 

‘ See [8], Exercise 17.2 to 17.5 inclusive. In our Figure 791 two of the original eleven groups having nearly the 
same mean have been combined for graphic convenience, the combination being immaterial to the present problem. 
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sents the variance of the ten income class means within the three strata. If we 
replace “income class” by the word “element,” the original grouping problem of 
the paper has been restated. 

The solution to this problem—not obvious from a visual inspection of Fig. 
791—happens to be the following one. Numbering the original ten classes from 
low to high income (from left to right in Fig. 791), put classes 1 to 6 in one strat- 
um, 7 to 9 in another, and 10 in a third by itself. This particular method of 
stratification is not mentioned by Hansen, Hurwitz and Madow, nor by Dal- 
enius and Gurney, who also discuss this same example.’ In this small problem 


Fic. 791. Atlanta families by income in 1933. 
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it is quite feasible to find the solution by hand tomputation. It is intuitively 
obvious, and it can be proved, that when the ten original classes are ordered 
according to income (i<j when a;<a,), the only partitions that need be 
considered are contiguous partitions, defined for;a set of completely ordered 
elements as a partition that consists entirely of subsets satisfying the following 
condition: if elements 7, 7, and k have the order i «7 <k, and if elements 7 and k 
are assigned to the same subset, then element j must also be assigned to that 
same subset.*® To find the optimal grouping it is therefore sufficient to compute 
the D values for each of the 36 possible contigudus partitions of ten elements 
into 3 groups, and then select one with minimum jD.” 











5 See [8] and [4], pp. 144-146. These writers consider some alternative solutions, including that obtained under 
conditions of optimal allocation of sampling numbers. For both problems values of the variance function for “nearly 
optimal” stratifications do not differ greatly from each other. : 

* Proof that a least squares partition is always contiguous is given int the Appendix. 

? The ber 36 is obtained from the formula in the next paragraph. The solution to this problem was actually 





obtained by an automatic computation to be mentioned in Section 4 below, and checked by hand computation. 
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The general unrestricted problem of K elements into G groups can, by the 
same reasoning, be reduced to a consideration of (4-}) contiguous partitions.® 


3. THE RESTRICTED PROBLEM 


In the preceding example the investigator, in solving the grouping problem, 
was at liberty to take the preliminary step of ordering the ten original classes 
by income level, and then seek a contiguous partition of these ten elements so 
ordered that minimized the squared distance. Assume for the moment, how- 
ever, that it was desired to “respect” some a priori ordering of these elements 
which might be different than the income order. In other words, while still 
seeking a minimal squared distance, a solution would be considered admissible 
only if it were a contiguous partition of the elements ordered in the a priori 
manner. For example, suppose that the ten original elements were families 
living on the same street in a known order of location, and that it is still desired 
to create three groups of families with maximum homogeneity measured in 
terms of income, but that it is also desired that these groups all be contiguous 
in terms of location. This latter condition of the problem may be regarded as 
a side condition or restriction imposed on the minimization of D. It is obvious 
that this problem is different from the old one; it is possible that the solution 
to it may not attain as low a D as the solution to the old one. 

Other types of side conditions may be imagined. It may be required that the 
solution involve only a partial ordering of the elements with respect to some 
criterion, as contrasted with a complete ordering. The given elements could be 
associated with points in some space of more than one dimension, apart from 
the values of the a;, and mathematical restrictions could be imposed on the 
coordinates of these points.* Certain partitions of the elements may be barred 
explicitly, irrespective of any concept of ordering or spatial location. The 
grouping problem defined in the second paragraph of this paper will be called a 
restricted problem if any a priori restrictions whatever are placed on the set of 
partitions of the K elements into G subsets that are regarded as admissible for 
a solution (other than the requirement that the subsets must be mutually ex- 
clusive and exhaustive). 

Most practical problems will be of the restricted type, since the investigator 
wil! almost always wish to inject prior knowledge, or factors of convenience into 
the conditions of the grouping. In fact, the class of restricted problems is so 
large that a general approach seems extremely difficult if not impossible. Even a 
definition of the major categories of restrictions that seem to be significant for 
practical applications is beyond the scope of this paper. It will suffice to present 
a larger numerical example that illustrates the solution of a problem having the 
simple type of restriction first mentioned: a complete a priori ordering of the 
elements that is different than the numerical order of the a;. 

In the discussion of time series in their textbook Wallis and Roberts [11] 





8 A contiguous partition of K completely ordered elements into G subsets may be represented by G —1 points 
of division lying in any of the K —1 intervals between adjacent elements, imagined to lie on a line in the specified 
order. The number of possible contiguous partitions will therefore equal the number of ways of choosing the division 
points, which is the number of combinations of K —1 different things taken G —1 at a time. 

* The notion of “property space” is applicable to such a scheme. See [1]. The numerical example in [6] includes 
certain restrictions on the K elements in a three dimensional property space. 
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present an example of the change of lake levels, over a time span of 96 years. 
Their graph, reproduced in Fig. 793, suggests certain epochs when the lake level 
was high and others when it was lower, although no obvious regularity or 
periodicity is apparent. Their analysis of the phenomenon is largely in terms 
of runs and moving averages. Suppose that it were desired to define G epochs 
such that the variation of lake level within epo¢hs, as defined by squared dis- 
tance D, is minimized. It is of course required that each epoch comprise only 
consecutive years in time: this is the a priori ordering. Then a restricted prob- 


Fie. 793. Lake Michigan-Huron highest monthly mean level, 1860-1955. 
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lem of the type formulated above results, with all weights w; equal to 1. The 
solution must be a contiguous partition in terms of the ordering in time—not 
necessarily the ordering according to lake level. — 

The solution to the problem for G ranging from 1 to 10 is given in Table 794. 
This solution is provided by ar automatic computer program to be mentioned 
in the next section. Alternative values for G are listed in the first column. The 
minimized values of D are listed in the second column. Each row of the tri- 
angular array headed “P” identifies an optimal partition that yields a solution 
to the problem for the G value of that row, giving the minimized D value of that 
row. The optimal partition is identified by the order number of the highest- 
order element of each subset, except the last (highest), which is always 96. For 
example, for G=3 the three epochs are: years 1 to 30, years 31 to 61, and years 
62 to 96. For G=1 the solution is of course trivial; the relevant partition is 
simply the original set, and the D value is the sum of squared deviations from 
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the general mean. It should also be remarked that this program does not yield 
multiple solutions when they exist. The program is discussed further below. 


TABLE 794 


AUTOMATIC COMPUTER SOLUTION TO LAKE MICHIGAN-HURON 
PROBLEM 
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4. METHODS OF SOLUTION 


For small unrestricted problems—say, of the order of K $20, GS5—it is 
feasible to obtain the solution by complete enumeration of all possible con- 
tiguous partitions by hand computation of D, and selection of a partition hav- 
ing minimal D. Some restricted problems of the same order will yield to the 
same treatment. For the restricted problem the number of admissible partitions 
for which D values must be computed may be less than for an unrestricted prob- 
lem of the same size, but the task of applying the restrictions to obtain admis- 
sible partitions consumes additional time. 

For some unrestricted problems where K is large but G still small it will be 
possible to obtain the solution or a near-solution by visual inspection of the 
frequency distribution of the a;, ordered according to their magnitudes. Divi- 
sions between groups may be placed where the data are sparse or the weights of 
small magnitude. This principle cannot be so readily applied when the number 
of such regions of sparse data does not correspond with the number of divisions 
to be made. When the frequency distribution of the a; can be represented or 
closely approximated by a continuous function of a fairly simple form—say 
by one that is not multi-modal—the method of Dalenius [3], [5] can be ap- 
plied.?® 

For the general problem with arbitrary distributions of the a; and with larger 
K or G, when the special devices noted above are not applicable, a combina- 





1 This method is based hep the principle that for a continuous frequency distribution a necessary condition for 
minimum D is equidist t any point of division between two adjacent subsets and the two means of the 
subsets. Dalenius outlines an iterative method for attaining this condition from an initial trial division. It has not 
yet been shown, however, for what class of frequency distributions this necessary condition is also sufficient; and 
examples can be found for which the condition is not sufficient, even if the usual conditions on the derivatives for 
a minimum D are also assumed. For example, if the given frequency distribution has extreme tri-modality, the D 
funetion for a division into two groups may have two local minima, either one of which may be approached by 
Dalenius’ iterative procedure, and so the mi imorum may have to be ascertained by further examination. 
Dalenius has acknowledged this fact (5, p. 165]. 
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torial approach seems indicated." In principle, since the number of possible 
partitions of K elements into G subsets is finite, it would still be possible to find 
the solution by consideration of each partition and selection of the one (or 
those) having minimum D. Unless this approach is modified, however, the 
number of combinations becomes so large (as indicated by the formula proved 
in Footnote 10) that consideration of all possibilities becomes impractical, even 
for the fastest digitai computers now in existence. For example, the number of 
contiguous partitions for a problem of the size of the Lake Michigan-Huron 
problem, where K=96, G=10, is slightly over one trillion. A high speed 
Princeton-type computer, computing D values at the rate of 10 milliseconds 
per partition, and working 24 hours a day, would require more than 280 years 
to compare these partitions one by one. 

Because of the additive character of the squared distance function, however, 
it is possible to reduce the computations very substantially by the use of sub- 
optimization procedures. Such procedures are implied by the following lemma. 


Suboptimization Lemma: If A:: A, denotes a partition of set A into two 
disjoint subsets A, and Ag, if P,* denotes a least squares partition of A, 
into G, subsets and if P,* denotes a least squares partition of A; into G; 
subsets; then, of the class of subpartitions of Ai: A, employing G; subsets 
over A, and G; subsets over A; a least squares subpartition” is P,*: P,*. 


In other words, once a least squares partition over set A; has been found, this 
work need not be done over again when testing for various partitions over Ae, 


providing that suitable records are kept. It is apparent that application of this 
lemma makes it possible to avoid separate consideration of many possible 
partitions of the entire set of elements. The extent of the saving of time is in- 
dicated by the fact that when the lemma was applied, the solution of the Lake 
Michigan-Huron problem for all G from 1 to 10 was actually obtained in 3 
minutes. 

The lemma will also hold if “least squares partition” is found under side con- 
ditions on admissible partitions, and hence is applicable to restricted problems. 

A program for the “Iliac” automatic digital computer at the University of 
Illinois has been written and checked by the author for solving the unrestricted 
grouping problem, or the restricted problem when the elements are com- 





1 It has been pointed out to the writer by George B. Dantzig in correspondence that the unrestricted grouping 
problem can be formulated as a non-linear programming problem by the use of special variables that assign the ele- 
ments to groups. The usefulness of this parallel is limited, however, by the non-availability of computational algori- 
thms for the type of programming problem where a strictly concave objective function is to be minimized on a convex 
set. Allowing for fractional assignment, let z,; denote the fractional part of a; that is assigned to group A(h=1- ~~ G; 
i=1--+-+K), seta= > - zpiwia;/ > ~ Ie zw, S= , ae =*_, raiw;(a; —G,)?, and consider the problem of minimising 
S subject to the constraints xa; 20 and 24_, 2a; =1. With given a’s and w’s, S isstrictly concavein the GK dimensional 
space of the zai, the constraint set is a convex polyhedron in this same space, and minimum S is atteined only at 
extreme points of this constaint set. An extreme point corresponds with a matrix [z,;] having a single unit element 
in each column, all other elements being zero (za; being unity when a; is “completely” assigned to group h, and sero 
when a; is not assigned to group h). Then the problem is precisely equivalent to the grouping problem of this paper, 
a, becomes a group mean, S becomes equivalent to our D, and for a solution no fractional assignment is possible. 
Moreover, the attainment of a local minimum, in the sense that D cannot be lowered by changing the assignment 
of any single a;, does not guarantee that the absolute minimum has been attained. All this emphasizes that the 
problem is essentiallv a combinatorial one. 

2 Proof: Let P: and Ps: denote partitions of A: and A: into G; and G: subsets respectively. Let Di, Da, Dis, 
D:*, Ds*, Dis* denote the squared distances associated with partitions P:, Ps, Pi: Ps, P:*, P2, Pi*: Ps* respectively. 
From the definition of least squares partition D:*<D: and D:*<Dz+. From the definition of D in equation (1) 
Dis* = D:*+D2* and Di: =Di+Ds. Hence Dis* <Dis, and so Pi*: P:* is a least squares partition. 
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pletely ordered a priori,“ with capacity K £200, G10. The program solves 
the problem to the extent of identifying one optimal partition and its asso- 
ciated squared distance for G=1, 2, - - - , G,where G is specified as the maxi- 
mum number of subsets to be considered, and cannot exceed 10. Machine 
running time for the largest possible problem is approximately 14 minutes, 
including input and output. 

Input of the data in the specified order is made on standard teletype tape. 
If an unrestricted problem is to be solved, then the w; and the a; must be put 
into the computer in order according to the numerical values of the a;. The 
basic method of solution is to have the “lIlliac” systematically identify and 
compute D values for all partitions that are relevant after consideration of con- 
tiguity and application of the suboptimization lemma. In applying the lemma 
to the problem of G subsets, systematic use is made of certain results obtained 
and recorded while working the problem of G—1 subsets. The solution to a 
problem appears in the form of Table 794 and has G rows. In fact Table 794 
is a precise reproduction of the output of a particular problem as it emerged 
from the page printer, with the exception that some non-significant decimals 
of the printed D values have been omitted. 

The computational program described has at least three lim:iations; the 
magnitude of K and G it will accommodate is still quite limited, and even with 
modifications to handle further increases in K and G, computativn time will 
press on reasonable limits; the program will not identify multiple solutions or 
near-solutions; and it will not handle restricted problems other than the special 


case of complete one-dimensional ordering. It is to be hoped that future prog- 
ress will overcome these shortcomings. 


5. GENERALIZATION 


Some ways of generalizing the grouping problem as formulated in this paper 
will be briefly indicated. A stochastic approach to the problem is presented in 
[6], as well as a rationale for dropping the assumption of fixed G, making the 
selection of G a part of the decision, which depends on the value of more de- 
tailed information as compared with the extra “cost of detail.” Even without 
such an explicit theory of cost, knowledge of the change in D resulting from 
change in G (see, for example the second column of Table 794) may assist the 
investigator in making a decision on what G he wants to use when he is initially 
uncertain. The “Iliac” program was designed to provide this information. 

While mention was made in Section 3 of the possibility of specifying a prop- 
erty space in more than one dimension, the idea of a single dimension for meas- 
uring squared distance D was retained. It would of course be most desirable to 
develop, both theoretically and computationally, a distance criterion that is 
defined in more than one dimension. An example of the need for such a formu- 
lation is shown in a multivariate stratification problem encountered in a sample 





% Instruction in programming and a key suggestion that made the writing of this program possible was 
given the author by D. B. Gillies of the Department of Mathematics of the University of Illinois. Invaluable aid 
in certain aspects of programming and debugging was given by Kern Dickman, Computer Consultant at the Uni- 
versity of Illinois. A copy of the instruction tape entitled “Optimal Partition of Discrete Points” is available in the 
Office of the Computer Consultant, and also in the hands of the author. 
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survey by Hagood and Bernert [7]. Of course involved in any such approach 
is a relevant system of weighting the different dimensions to reflect their rela- 
tive importance in determining distance. 

The one-dimensional approach of this paper may be used to provide an 
approximate solution to a multi-dimensional grouping problem. As a pre- 
liminary step the data may be reduced to single measures on each element by 
extracting the first principal component, or by other methods of factor analysis. 
Such procedures are now well known and routinized on many types of comput- 
ing machines. Then the computational methods suggested above may be 
applied to group the elements into the desired number of groups. The goodness 
of this approximation will, of course, depend on the degree of dominance of the 
first principal component in the multi-dimensional scatter. 


APPENDIX: PROOF THAT A LEAST SQUARES PARTITION IS CONTIGUOUS 


Consider a non-contiguous partition as defined in the text. Let elements 1 
and k belong to a subset having mean @,, while j belongs to a different subset 
having mean 4;, and where a;<a;<a,. Then, whatever be the values of d@ and 
d;, at least one of the following three statements is true: 


> |a; — du| >0, (1) 

la; —da| > |a;—a4| >0, (2) 
|a, — du| > |a—a4;| >0. (3) 

In other words, of the three distinct points, a;, a;, and a,, there exists one whose 
distance from the mean of its own subset is equal to or greater than its distance 
from the mean of another subset, both distances being positive. Relabelling 


such a point as “a,” its own subset A with mean 4, and the “foreign” subset 
B with mean 6, we have 


|a—a| > |a-5| >0. (4) 


From definition (1) of the text the squared distance associated with the given 
partition may be written 


| a; — 4;| 


K 
D= ke wa, = Wia@ = W 3b? - R, (5) 


t=l 


where Wa= >vica wi, We= Doses wi, and R denotes a weighted sum of squared 
means of subsets other than A and B. 

Consider the new partition formed by transferring point a from subset A 
to subset B. Let A’ with mean @’ and B’ with mean }b’ denote the new subsets 
after the transfer. Since from (4) point a was distinct from the mean d, set A 
contained at least two points; hence both A’ and B’ contain at least one point, 
and the new partition has the same number of subsets as the old. The new 
means @’ and 6’ can be determined from the relationships 


(Ws — w)d’ = Wad — wa, (6) 
(We + w)b’ = Wad + wa, (7) 
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where w is the weight of point a. The squared distance associated with the new 
partition is 


K 
D' = > wa? — (Wa — wa’? — (Wa + w)b? — R. (8) 
t=l 


By subtracting (8) from (5), eliminating 4’ and }’ by means of (6) and (7), 
and simplifying, it follows that 


rg eg, eile e og tig, 
D- D' =u ——*—(o et re 5]. (9) 


From (4) and from the fact that all of the weights are positive with W4>w, 
the right-hand side of (9) is found to be positive, and hence D> D’. Hence any 
non-contiguous partition can always be altered to give another partition with 
the same number of subsets and with smaller squared distance. Hence a least 
squares partition must be a contiguous partition. 
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SIGNIFICANCE TESTS IN PARALLEL AND IN SERIES 


I. J. Goop 
Government Communications Headquarters, Cheltenham, England 


The advice is often given that significance tests should be selected 
before sampling evidence is examined. It is suggested here that this 
advice is appropriate only for inexperienced statisticians, and an 
approximate rule of thumb is tentatively proposed in the hope of 
provoking discussion, namely that the statistician could in some cases 
use a harmonic mean or weighted harmonic mean of the tail-area 
probabilities arising from various tests, all on the same evidence 
(tests in “parallel”). This rule of thumb should not be used if the 
statistician can think of anything better to do, and especially of course 
if he is in a position to use a sufficient statistic (or an “efficacious” one, 
in a sense defined below). An application is given to the judgment of the 
weights that may be used for combining tests ir series. 


1. Statisticians live with vagueness, yet most of their published work is 
precise. There are some problems however where vague questions may require 
fairly vague answers. This seems to be true for the question discussed in the 
present paper. 

As I understand it, statistics is not primarily for making objective state- 
ments, but rather for introducing as much objectivity as possible into our sub- 
jective judgments. It is only in limited circumstances that fully objective 
statements can be made, although the literature of theoretical statistics is 
mainly concerned with such circumstances. The notion that it must all be pre- 
cise is harmful enough to be worth naming. I shall call it the “precision fallacy.” 
If we refuse to discuss problems in which vagueness is unavoidable then we 
shail exclude a large proportion of real-life problems from consideration. In 
fact every judgment involves vagueness, because when it becomes precise it is 
no longer called a judgment. Vagueness will not disappear if we bury our heads 
in the sand and whistle. An example of the precision fallacy is the giving of 
equal weights to several sources of information on the ground that to do 
otherwise may be arbitrary. 

The present paper is concerned with a class of problems of considerable 
statistical interest where no precise conclusions have so far been reached, and 
where contradictory advice has been given in the published literature. The in- 
tention here is not to produce any final answers but only to provoke discussion. 
The “harmonic-mean rule of thumb” is presented with some misgivings, be- 
cause, like many other statistical techniques, it is liable to be used thought- 
lessly merely because it has appeared in print. More important than this rule 
of thumb is the familiar supplication, “think.” 

2. The advice is often given, and there is 2 great deal to be said for it, that 
a statistician should decide on his significance test before looking at the evidence 
of a sample: 

“...it is absolutely essential that... the test... be fully specified, before 
viewing the sample. This is, of course, an indispensable condition for the validity of 
any test of significance” [15, p. 229]. 
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This advice is not always followed by statisticians, who may present several 
tests on the same evidence if only in order to please more than one school 
of readers, as, for example, by J. B. S. Haldene and C. A. B. Smith [11]. The 
advice is given partly in order to safeguard the inexperienced statistician 
against the twin temptations of: 

(a) choosing among several natural-looking statistics the one whose “tail- 
area probability” (in a self-explanatory sense) is least; and 

(b) choosing an artificial statistic, suggested by the evidence alone, and 
having a very small tail-area probability. 

Another, historically important, reason for this advice is so that in routine 
situations the statistician can reject the null hypothesis at a fixed level of 
significance, and know in advance the probability of rejecting the null hypothe- 
sis when it is true. A familiar criticism of this point of view, taken by itself, 
is that the statistic need have no relation to the truth or ortherwise of the null 
hypothesis and may even be drawn from a table of random numbers. 

But the advice cannot be consistently followed, since a statistician is not 
always aware, in advance, of all the hypotheses that may be strongly suggested 
or even demonstrated by an experiment. A «nple example is given [8, p. 253] 
which, for the convenience of readers, we quote here: 

“A sample of 100 readings is taken from some distribution for which the null 
hypothesis is that the readings are independently distributed with a normal distribu- 
tion of zero mean and unit variance. It is decided in advance of sampling to divide 
the normal distribution up into ten equal areas, and to apply the x? test to the ten- 
category equiprobable multinomial distribution of frequencies with which the readings 
fall into the ten areas. This would appear to be a very reasonable statistic in advance 
of sampling. But what if it leads to a non-significant result even though one of the 
100 readings was twenty standard deviations above the mean?” 

It may be sometimes be a matter of dispute whether a “nun-null” hypothesis 
H., suggested by the result of an experiment, has too low an initial (prior) 
probability to be accepted, in spite of the small tail-area probaility (given the 
null hypothesis, H,) of some appropriate statistic. 

The obvious thing to do if an unexpected hypothesis is suggested by an 
experiment is to organize a further experiment. But this is not always possible 
nor desirable. If a pilot experiment for an urgent agricultural or medical project 
were to suggest, strongly enough, some hypothesis that was not thought of in 
advance, it may be sensible to act on this hypothesis, without delaying matters 
by carrying out another pilot experiment. Furthermore, classical objectivistic 
statistics gives no rule for deciding how strongly a far-fetched hypothesis must 
be suggested by an experiment before time, money and effort should be in- 
vested in another experiment. 

It therefore does not seem to be unreasonable to apply several different tests 
of significance to the same evidence, for testing an assigned null hypothesis, 
even if the tests are not decided in advance and even if some of them are sug- 
gested by the evidence. It is reasonable, dangerous, and often done. 

Although this “multitest” point of view does not seem to have been expressed 
in most textbooks it is not new, in fact it must be almost as old as the use of 
significance tests themselves. The point of view is clearly supported by Cramér 
[3, § 30.8], and more especially by Cochran [2, pp. 417, 447]; both writers 
emphasize that the chi-squared test should often be supplemented by other 
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tests, especially when the statistician has some idea of what non-null hypotheses 
are initially probable. 

There are at least three general situations in which two or more tests may be 
applied to the same data and the same null hypothesis. We shall throughout 
refer to these as cases A, B, and C. They are: 

A. Several “alternatives,” i.e., several non-null hypotheses, not necessarily 
simple statistical hypotheses and not necessarily very precisely formulated, 
and not necessarily formulated in advance of the experiment. 

B. Several inefficient tests against a fixed alternative. Here we have several 
tests, each designed to test the same non-null hypothesis, but none of the tests 
being fully efficient. (The word “efficient” need not be interpreted in any 
precise sense here.) 

C. Several tests against a fixed alternative, one of them fully efficient for a cer- 
tain mathematical model. The other tests are less efficient but may have the 
advantage of robustness, i.e., they may suffer less when the model is somewhat 
inaccurate. 

More complicated cases are also possible, in which the features of two or all 
three of the above cases are present. 

In cases A and C, it may be possible to make one’s models more comprehen- 
sive, and hence to arrive at a decision whether to reject the null hypothesis, 
by the use of any of several different statistical philos~Uhies. Quite often how- 
ever the statistician may not be able to face the prospect of such a programme 
while still considering it worth while to appiy more than one test. 

3. In the present section we shall be mainly concerned with cases A and B. 
(Our method may sometimes apply in case C, but we are making no claims.) 

Having applied several tests there is some point in publishing the results of 
all of them. It can be argued that the tests are part of the total experiment. 
To suppress any of them would be like suppressing some of the ordinary obser- 
vations, a procedure that is often exploited by the unscrupulous, and is also 
often forced upon the scrupulous because of the costs of printing. (Of course if 
all the observations are published then it is not logically essential to publish any 
statistical tests, but to suppress any of them could be deliberately and highly 
misleading.) 

But just as observations can sometimes be reduced to a few statistics 
(preferably sufficient), it may sometimes be possible to replace several tail- 
area probabilities by a single tail-area probability that to some extent summa- 
rizes them all. A familiar example is Fisher’s method of combining significance 
tests, which we discuss below. 

The suggestion to be made here is that in cases A and B (several alternatives, 
or several inefficient tests against a fixed alternative) if the tail-area probabili- 
ties of several statistics on the same evidence (and for the same null hypothesis) 
are P;, P2, +--+, Pa, then, if the statistician can think of nothing better to do, or 
if to do better would involve too much work, it is reasonable for him to sum- 
marize them approximately by using their harmonic mean, 


A= n/ZP > 
or a weighted harmonic mean, 
\(w) = rw,/lw;P =, 
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where the weights w; will need to be selected subjectively. (An example is given 
in Appendix III.) The weight w; should be thought of vaguely as proportional 
to the probability that the ith test is the one we should be using if we were wise 
enough. In case A (several alternatives) the weights would be related to the 
initial probabilities of the alternatives; in case B (several tests, one alternative) 
they would be related to the probabilities that the different tests would reach 
specified significance levels. (Note that if all the tests give the same tail-area 
probabilities then any sort of mean will again give the same tail-area probabil- 
ity, and the choice of weights would then be irrelevant.) The weights will 
typically be judged to lie in intervals rather than as having precise values. If 
these intervals are too wide then the statistician may have to admit that he 
cannot evaluate the evidence without further experiments. 

The weights are preferably selected before examining the evidence, but this 
will not always be practicable. If they are selected after looking at the evidence 
they will seldom be all equal, because it is nearly always possibly to dream up 
far-fetched hypotheses or tests that do well on given observations. 

We shall give below some arguments in favour of the use of the harmonic 
mean or weighted harmonic mean, but first we must clarify what we mean 
semantically by the “use” of the harmonic mean. 

This clarification will be helped by comparing the problem with what is 
usually called the “combination of significance tests,” but which we shall call 
here the “combination of significance tests in series” in order to distinguish it 
from the problem of the present paper which is the “combination of significance 
tests in parallel.” The notion of tests in series and in parallel has an obvious 
analogy with electrical resistances in series and parallel, an analogy that is 
suggestive of more general problems in which tests form a complicated logical 
network, partly in series and partly in parallel. A similar analogy holds with 
parallel and serial automatic computers. The electrical analogy may act as a 
mnemonic for the use of the harmonic mean of tail-area probabilities for tests 
in parallel: there is no temptation to add tail-area probabilities for tests in 
series. 

For independent tests in series a mistake occasionally made is to treat the 
product P,;P2P; - - - as if it were a tail-area probability. Fisher [12, end of 
Chapter 21] pointed out that if the statistician must use the statistic P,P.P; 

- + + then he would be advised to attend to its distribution. Furthermore he 
pointed out that this distribution could be worked out. An extension to the 
weighted combination of significance tests in series, i.e., to the use of the 
statistic Pi)“ P,"*P;" - - - , was given in (7).! 

Similarly, for tests in parallel, it would be desirable to know the distribution 
of \ and A(w). Unfortunately it would be difficult or impossible to find this 
distribution in general terms (although in eaci application the distribution 
could be found in principle, at any rate when the null hypothesis is a simple 
statistical hypothesis). Therefore by the “use” of \ or A(w) we must mean, in 
practice, something different from its use as a statistic whose distribution is to 





1 It has been drawn to my attention that the mathematical ideas of that paper had been previously published 
for other applications. See, for example, Box [1], Darling [4], Gurland [10], Robbins and Pitman [14], K. Pearson, 
Stouffer and David [13]. Wallis has further mentioned that the idea of a weighted combination of significance 
tests (in series) was once suggested by Paul Samuelson in a private letter. 
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be worked out. The use that we suggest is that \ or \(w) should be regarded 
as an approximate tail-area probability. Since it is only approximate it would 
not be legitimate to combine in series the combinations of parallel tests: at 
any rate the approximations would become more inaccurate as more sets of 
parallel tests were combined. 

4. When we have made several tests of the same hypothesis, either in series 
or in parallel, there is no compulsion to combine them, instead we could just 
look at the results of the tests separately and form an overall judgment. But 
it makes for economy of thought and is in the spirit of statistical methods to 
represent the results of the tests by a single number if possible. For a set of 
parallel tests it is intuitively clear that some sort of mean of the tail-area proba- 
bilities is required. The harmonic mean suggests itself since otherwise a single 
very small probability would be swallowed up by the presence of non-significant 
tests. In order to counteract temptation (b) it is natural to use a weighted 
harmonic mean. This argument for the use of \ or A(w) will now be supported 
by another intuitive argument, this time a neo-Bayesian one. (By a neo- 
Bayesian or neo/Bayes-Laplace philosophy we mean one that makes use of 
inverse probabilities, with or without utilities, but without necessarily using 
Bayes’s postulate of equiprobable or uniform initial distributions, and with 
explicit emphasis on the use of probability judgments in the form of in- 
equalities.) 

Suppose that we are testing a null hypothesis H, and that we have evidence 
E and statistics ¢,(E£), ¢.(E), ¢3(Z£), - - - whose tail-area probabilities, given 
the null hypothesis, are P,, Ps, P3, - - - . (Some of these may be double-tail-area 
probabilities.) We have in mind a non-null hypothesis H,, often both vague 
and highly composite, and we call 


F; = Pr(¢,(E) | Ho)/Pr(oi(E) | Ho) 


the “Bayes factor” in favour of H, (or against H,) provided a knowledge of 
the statistic ¢;(Z). (For the terminology see, for example, [6], Chapter 6]. We 
mention in passing that if the statistic always provides the same factor in 
favour of H, as does the random event £, then the statistic may be described 
as “efficacious for testing H,.” (See Appendix I.) A sufficient statistic (sufficient 
for “estimating which hypothesis is true,” to coin a phrase) is always efficacious, 
but not conversely. The Bayes factor is equal to the likelihood ratio when H, 
and H, are both simple statistical hypotheses, but not otherwise in general. 
The Bayes factor is the factor by which the initial odds of H, are multiplied in 
virtue of the knowledge of the value of ¢;(#). Since our judgments are seldom 
precise, F; is not usually known precisely but is circumscribed only by in- 
equalities. I have usually found however, in numerous examples, that when 
¢; is a reasonable statistic and when the assumed initial distributions are also 
reasonable, then if 0.001 <P;<0.2, 
1 


F; = — 
yP; 


where y= 10%3 (i.e. 33 <<y<30) and when P;>0.01, 7 is often about 4 or 5. 
(See Appendix IV.) For example, the familiar tail-area probability of 5 per cent 
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is often worth a Bayes factor of about 5 or 4 against the null hypothesis. That 
is why a tail-area probability of 5 per cent is often good grounds for doing 
another experiment. 

For simplicity we shall now assume that 7 is an absolute constant. This is 
certainly not true in general; there is a tendency for y to be slightly larger 
(say by a factor of about 2) for very highly significant experiments. The argu- 
ment could be modified to allow for y’s not being an absolute constant. The 
effect would be that our harmonic mean, \ would be replaced by a weighted 
harmonic mean with imprecisely defined weights (i.e. weights circumscribed 
only by inequalities), while \(w) would also have its weights made vaguer. 
If we were combining tests in series, the approximation of regarding y as an 
absolute constant would be far more dangerous because the errors would be 
cumulative. Nevertheless the argument applied to tests in series does provide a 
rough justification for using P,P.P; - - - as a statistic when the individual tests 
deserve equal weights. (If y were an absolute constant, then 1/(y"P:P2 - - « Px) 
would be the Bayes factor, so that P,P;---P, would be an efficacious 
statistic.) 

Let w;, 2, W3,* ++ be proportional to the initial probabilities that the 
statistic that we should be using, if we were wise enough, would be ¢1, ¢», 
¢s, °° respectively. (For example, if two of the tests were almost the same 
test, then the weight that either would get if used without the other would be 
roughly equally shared between the two tests when used together. Hence we 
should get nearly the same result whether we used [in addition to further tests 
in parallel] both tests or only one of them, and this is as it should be.) Then, 
by an intuitive analogue of the theorem of the weighted average of Bayes 
factors [6, p. 68], we 1may summarize the parallel statistics by means of a Bayes 
factor of 


_, et ie +. 3° > 
m+w+::: 


PF 





This is the same as replacing P;, P:, P:, +--+ - by the tail-area probability 
| 1 wm + w2+--- 


*¥F 





Wi Ws 
P, * P, ™ 
This completes the second intuitive demonstration of our suggestion for com- 
bining parallel tests. Note that it goes further than the first intuitive demon- 
stration in that it gives more substance to the weights w;, w2, ws, ++ * , at any 
rate in case A (several alternatives). For in case A the weights would be 
proportional to the initial probabilities of the alternatives. 

5. The method of the present paper sheds some light on the weighted com- 
bination of tests in series. The statistic 


Q(a, a2, +--+) = PtP *---, 


has a distribution that can be readily calculated [7, equation (1)]. W. H. 
Kruskal has privately raised the question of what rule of thumb may be used 
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for the selection of the weights a1, a2, - - - . Many different rules of thumb may 
be used, and the following one appeals to me personally. 

Let tests 1, 2,3, --- be such that tail-area probabilities for them of a), a, 
d;, *- + would be equally impressive as evidence against the null hypothesis, 
where the geometric mean of a, dz, ds, -- + is about 1/20. Then take a; such 
that 


a* = 1/20 (i = 1,2,3,---). 


These values of the a’s so to speak convert all the tests to an equal footing 
when they are equally impressive. Call this the 5 per cent rule of thumb for 
combining tests in series. We can similarly define a 1 per cent rule of thumb 
with 20 replaced by 100, and a 0.1 per cent rule of thumb with 20 replacea by 
1000. In this way we get statistics 


Q5%), Q1%), (0.1%), 


and can compute their tail-area probabilities. We now have the tail-area 
probabilities of three tests in parallel, each applied to the same set of tests in 
series, (These three tests correspond to non-null hypotheses in which succes- 
sively greater departures from the null hypothesis are expected, so that we are 
in case A.) We may combine them by taking a weighted harmonic mean. 
Reasonable general-purpose weights would be 3, 2, 1. All this is subjective, but 
we think not hopelessly so. To take all the a’s equal, as has usually been done 
in the past, is more objective and gives less scope to the statistician’s judgment. 
To insist on it would be an example of the precision fallacy. 

I am grateful to the Referees for several useful suggestions, especially 
concerning the method of presentation. 


APPENDIX I. EFFICACIOUS STATISTICS 


The description of an efficacious statistic, given in the main text, requires 
some elaboration, especially for the later appendixes. The non-mathematical 
reader may however skip the present appendix without much loss of under- 
standing. 

Let H, be a simple statistical hypothesis, designated the “null hypothesis.” 
Let E be the result of an experiment, and let ¢{F) be a statistic, that is a 
numerical function of EZ. 

In formal logic the negation of H, is often denoted by H,. We shall think of 
H as the logical disjunction of all the alternatives to H, that are worth con- 
sidering. Each alternative is a non-null hypothesis whereas H, itself is the non- 
null hypothesis. As a special case H, may be a simple statistical hypothesis. 
Otherwise it is composite and is the logical disjunction of a number of simple 
hypotheses, where this number may be finite, enumerable, of the power of the 
continuum, or even more. Given H, we suppose that there is a “true” initial 
(=prior) distribution, , of the simple non-null hypotheses. (What é does is to 
specify the probabilities of all well-definable disjunctions of subsets of these 
hypotheses, in a consistent, i.e. a completely additive manner.) The statistician 
may not be prepared to assume é, but he will always assume that it belongs to 
some class Z. In particular, when he says that he assumes nothing he is assuming 
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that Z is the (“universal”) class of all possible ¢’s. When JZ, is a simple statis- 
tical hypothesis, then ¢ is uniquely defined, and in this case it is not necessary 
to refer to ¢ at all. 

Let 


F(H./Ho:E| £) 


be read from left to right and denote ‘the “Bayes factor in favour of 7, as 
against H,, provided by E£, given ¢.”* (Any given proposition can be represented 
on the right of the vertical stroke.) Suppose that for each ¢ in =, F(H./H.: E|) 
is a strictly monotonic bicontinuous function of ¢(Z£). (By “bicontinuous” we 
mean that F is a continuous function of ¢(#) and conversely ¢(£) is a continu- 
ous function of F.) Then ¢(#) may be described as efficacious for testing H, 
relative to Z. The justification for this complimentary term is that, for each 
£, it comes to the same to know the tail-area probabilities of F and ¢. When Z 
is the universal class we may say that ¢(Z) is completely efficacious for testing 
H, relative to H.. 

When Z, is a simple statistical hypothesis we may say that ¢(E) is simply 
efficacious for testing H, relative to H.. A simply efficacious statistic is a special 
case of a completely efficacious one. 

In an example in Appendixes II and III the restrictions on the ¢’s are so 
weak that we feel justified in describing the statistic as “nearly” completely 
efficacious, but we shall not propose a rigorous definition of this expression. 

In many problems £ depends only on a parameter which is a point in a 
finite-dimensional Euclidean space. (¢ is unique if the dimensionality is zero.) 
This is a much bigger restriction than was just intended, and in this case ¢ 
could be described as largely efficacious for testing H, relative to the class of 
assumed initial distributions. 

Simply efficacious and completely efficacious statistics are not necessarily 
sufficient for testing H,, but they are just as good as sufficient statistics when 
we are sure that H, covers all reasonable alternative hypotheses. Largely 
efficacious statistics also leave little to be desired. 


APPENDIX II. TESTING THE MEAN OF A MULTIVARIATE NORMAL DISTRIBUTION 
OF ENOWN COVARIANCE MATRIX 


A referee of this paper suggested an example that involved sampling a bivari- 
ate normal distribution. It is considered in the next appendix. In preparation 
for it, and for the final appendix, we shall now consider the problem of testing 
whether the mean of a multivariate normal distribution is at the origin, given 
a sample, when the covariance matrix, C, is known. (The less mathematical 
readers may like to skim through the present appendix.) 

Denote by N (u, C) the k-dimensional non-singular multivariate normal dis- 
tribution whose density is 


(2)-"%* | C|-"2 exp { —3(x — y)’/C-(x — p)}, 


where |C| means the determinant of C, and the prime denotes transposition. 
We wish to test the “null hypothesis,” H,, that y=0, where 0 is the origin of 





2 The colon to denote “provided by” has also been used in information theory. (Proc. Inst. Elec. Eng. 
(C) (8) 103 (1956), 200-4.) 
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coordinates. We take a sample of one k-dimensional vector x. There is no real 
loss of generality in taking a sample of only one vector, since nk components of 
n vectors sampled independently can be regarded as sampled from an obvious 
kn-dimensional multivariate normal distribution. There is also no real loss of 
generality in restricting the discussion to the testing that w is at the origin of 
coordinates: the more general case is reducible to this one by means of a simple 
change of variables. 

The most familiar test of H, is in terms of the likelihood-ratio statistics, twice 
whose natural logarithm is 


A = x'Cx, 


A has a gamma-variate (chi-squared) distribution with k degrees of freedom 
when the null hypothesis is true (for example, [16, p. 104]). 

If the non-null hypothesis were a simple statistical hypothesis, Hy, in which 
u had a specified value, then the Bayes factor (or simple likelihood ratio) 
against H, would be 

exp (x’C“'y — }y’C“'y’), 

and the statistic x‘C~'y would be simply efficacious. The statistic x/C—'p/|y| 
is completely efficacious relative to the class of non-null hypotheses for which 
the mean lies on an assigned semi-infinite straight line starting at the origin. 
The statistic | x/C-'y| /|y| is nearly completely efficacious relative to the class 
of non-null hypotheses for which the mean lies on an assigned line through the 
origin, infinite in both directions, assuming only that the densities at pairs of 
“antipodal” points are equal, i.e. that the initial distributions are symmetrical 
about the origin. (This is the weak restriction on the ¢’s that justifies the ex- 
pression “nearly completely efficacious.”) In this case the Bayes factor is a 
weighted integral of cosh (| x’C-y| /| y|), and is therefore a monotonic function 
of the statistic. 

The expected log-factor (expected weight of evidence) in favour of Hy when 
it is true is }y’C~'p and when false (i.e. when H, is true) is — }y’C~'p. Therefore, 
for a given value of ||, the least favourable y’s are the two antipodal points, 
on the sphere | w| equals constant, that minimize y’C~'p/(p’y), and they cor- 
respond to the eigenvector of C~ of smallest eigenvalue. This is the same as 
the eigenvector of C of largest eigenvalue. (All the eigenvalues of C are real 
and positive. The linear manifold containing the eigenvectors that correspond 
to the largest eigenvalue may be of dimension more than 1, but I shall assume 
for the sake of simplicity that the dimensionality is 1.) If @ is such an eigenvec- 
tor, then the statistic |x’a| (which is proportional to | x’C-'y]) is nearly com- 
pletely efficacious for testing H, relative to those departures from H, “lying in 
the direction through the origin” that presents the greatest difficulty of dis- 
crimination. It is the best statistic for a pessimistic statistician. 

Suppose next that, if Ho is false, y has an initial distribution that is itself 
multivariate normal, N(O, D). Then, by the theorem of the weighted average 
of factors [6, p. 68], the Bayes factor is 


F = (2n)-"| D|-¥2 f exp (x'C-ly — }y’C—'y — 4y’D-'y)dy, 
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(if D is non-singular) and this, by straightforward .aathematics, gives the 
Bayes factor 


F = |cC|"*|C + D|-" exp {4x’C"D(C + D)-"x}. 


(This will be true even if D is singular. In particular if D= @@’, which implies 
that the mean is on the least favourable direction through the origin, we are 
led back to the statistic | x’@| .) 

lt is interesting to observe that this Bayes factor reduces to a function of 
the likelihood ratio, i.e. to a function of A, if and only if D is proportional to 
(i.e. a scalar multiple of) C (in which case all the correlation coefficients be- 
tween the components of yp, in its initial distribution, are the same as those 
between the components of x). It may be conjectured that the Bayes factor is a 
function of A if and only if the initial distribution of y, given H., is of the 
form N(O, D) with D proportional to C. If so we could say that the use of the 
likelihood-ratio statistic A is equivalent to assuming such an initial distribution 
of y, given H,. The necessary and sufficient condition for A to be a reasonable 
statistic would be that we could reasonably make this assumption about the 
initial distribution of y. At any rate A is “largely efficacious” relative to these 
initial distributions. 

In some circumstances it may be more reasonable to assume that D is a 
diagonal matrix, since, with appropriate changes of scale, this would make the 
initial distribution of y spherically symmetrical about the origin. 

Whatever assumption is made about D, and explicit formula can be obtained 
for the distribution of F, because the distribution of any quadratic form in x is 
known [1]. In particular, the expected weights of evidence, given H, and given 
H., are (in “natural bels”) 


&(log F(Ho/Ho:E)| He) = &(log F| Ho) 
= str {(C+ D)“'D} —4log |I+C"D| 
and 
&(log F | Ho) = 3 tr (CD) — flog | 1+ CD]. 


It may be verified by pure mathematics that the first of these expectations is 
negative and the second one is positive (compare [6, p. 72]). 

In particular, if D=8C, where £ is a scalar, in which case the likelihood ratio 
is largely efficacious, then 


&(log F | Hy) = — $k{log (1 + 8) — B/(1 + B)}, 


and 


&(log F | Ho) = 3k {8 — log (1 + 8)}. 


If we are determined to use the likelihood-ratio statistic then we may as wel! 
assume, for the sake of consistency, that D=8C (at any rate if we assume that 
D exists at all). The question remains what value of 6 should be selected. A 
reasonable answer to this can be provided by the following philosophy, which 
may be called “marginalism.” 
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Suppose that we have decided to perform a sampling experiment in order to 
test H.. If we believed that the expected weight of evidence from the experi- 
ment, given H., was negligible, we should have decided not to do the experi- 
ment. So, for consistency, we must assume that the expectation is at least of 
the order of one natural bel (i.e. is at least say 1, when natural logarithms are 
used). If the expectation is much greater than one natural bel then the experi- 
ment will probably be so informative that it will not be serious if we under- 
estimate 8. It is important to get 8 about right only in a “marginal case,” i.e. 
if the experiment is only just worth doing with the assigned sample size. 
Accordingly it is reasonable to determine 8 by equating &(log F| 77.) to say 1 
or 2. An example will be given in Appendix IV to suggest that it does not make 
much difference which of these values is used to determine @. 

In theory a perfectly rational Bayesian ought to know the value of 8, and 
a neo-Bayesian ought to be able to judge inequalities for 8, independently of 
the sample size. It is legitimate for the neo-Bayesian to make use of marginalism 
provided that he can judge in advance what sample size is reasonable. When 
acting as a statistical adviser he may sometimes be inclined to leave this 
judgment to his client. It would be advisable to ask the client to judge a range 
of reasonable sample sizes. 


APPENDIX III. AN EXAMPLE CONCERNING THE HARMONIC-MEAN RULE OF THUMB 


There is no “special pleading” in the example of this appendix, since it was 
suggested by a referee who, owing to a mistake in sign, thought it virtually 
disproved our tentative rule of thumb for combining tests in parallel. 

Let N (uz, uy, 0”, 72, por) be the bivariate normal distribution of means py, pe, 
variances o”, 7’, and covariance por. 

The null hypothesis, H,, is N(0, 0, 1, 1, —4). 

The class of alternative hypotheses is N(uz, uy, 1, 1, —}$) with u.0, ny, ¥0. 

Thirty samples were drawn, each consisting of five 2-vectors, (11, y:),° °°, 
(zs, ys). For this purpose, pages 1 to 3 of Fieller, Lewis, and Pearson [5], were 
used. 

(When applying the theory of Appendix II to this example it should be 
noted that k=10. The covariance matrix of that appendix consists of five 


matrices 
Pan 


down the main diagonal, and eighty zeroes elsewhere.) 

Two methods of testing the null hypothesis were used. 

Method I was to make two separate tests for the means of each variable, 
by calculating 


z= V/5'#, 2% = V5-9 


where # and f are the means of the five first components and five second com- 
ponents respectively. These statistics are each (standardised) normal deviates, 
given H., and the two double-tail-area probabilities P, and P, were determined. 
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Then the harmonic mean, P;, of P, and P, was evaluated. (There was no 
reason to give P, and P, unequal weights.) 
In Method II the statistic was |z| where 


z = (f — 9)V(G/3) = (@ — %)/V3, 


and the tail-area probability was denoted by Piz. This statistic was advocated 
by the referee and, by Appendix II, it is nearly completely efficacious for de- 
tecting those departures from H, for which the mean lies in the least favourable 
direction through the origin. 

It will be seen from Table 810 that the agreement between the columns 
headed P; and Py; is quite good: in all but 33 of the thirty samples we have 
$P11<P1<2Py. There is some tendency for Pr, to exceed P;; this is consistent 


TABLE 810 


SOME STATISTICS AND TAIL-AREA PROBABILITIES FOR THIRTY 
SAMPLES FROM A BIVARIATE NORMAL DISTRIBUTION. P; IS 
THE HARMONIC MEAN OF P, AND P, 








Expt. 
No. 





1 
2 
3 
4 
5 
6 
7 
8 
9 
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with the view that it is more informative, since the null hypothesis was true for 
all thirty samples. 

The column headed P, gives the tail-area probability of the likelihood-ratio 
statistic, twice whose natural logarithm is A= >-%., (c?+2,y.:+y2). It agrees 
less well with either P; or P;; than they agree with each other. On the face of it, 
therefore, method I seems to be more sensitive than the likelihood-ratio method 
when nature is in a nasty mood. But we should need several more examples 
before we could state this as a general rule. 


APPENDIX IV. THE RELATIONSHIP BETWEEN BAYES FACTORS AND 
TAIL-AREA PROBABILITIES 


In the neo-Bayes-Laplace philosophy initial probabilities are judged only 
to lie in intervals of valves. To suppose that they can be judged precisely is 
an example of the precision fallacy. But the kinds of judgments that can be 
“plugged in” as the input to a piece of statistical analysis are of many kinds, 
and often one of them is that various precise initial probability distributions are 
good enough. Sometimes we are uncertain of this judgment and we may then 
find it convenient to try two or three distinct sets of assumptions in order to 
make sure that they lead to substantially the same results. 

Moreover we can often fall back on the “Bayes/non-Bayes synthesis” (as in 
Good [9]). It is a special case of the general recommendation that, when it is 
not too much effort, one should be consistent. The synthesis consists of three 
steps: 


(i) We use the neo-Bayes-Laplace philosophy (“neo-Bayesism”) in order to 
arrive at a factor, F, in favour of the non-null hypothesis. F may be circum- 
scribed by inequalities, but for definiteness we suppose that a precise value has 
been obtained. 

(ii) We then use F “as a statistic” and try to obtain its distribution on the 
nul! hypothesis, and work with its tail-area probability, P. 

(iii) Finally we look to see if F lies in the range 


(sar sr) 
30P  10P 
when 0.001 <P <0.2 (i.e. for the values of P that are usually of most practical 
interest). If it does not lie in this range we think again. This third step can be 


applied to artificial sampling experiments, not just to the real-life problem. 
A statistician who uses this synthesis will tend to force the inequality 


1 


30P 10P 


to be true. If it were violently incorrect it would mean that the ne.-Bayesist 
and the non-Bayesist could come to very different decisions on ‘*2 same 
evidence. This does not usually happen. 

As an example, we consider the problem of Appendix II, dealing with multi- 
variate normal distributions. Suppose that D=8C, so that A is “largely 
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efficacious.” Let the values of 8 for which &(log, F| H.)=1 and 2 be denoted 
by 8; and 8, so that 


2 
B: — log. (1 + fi) = . 4s 


4 
Bs — log. (1 + Bs) Ri 


Now specialise to the problem of Appendix III, in which k= 10. Then 8; =0.772 
and £;=1.18. Denote the corresponding values of F by F; and F2, so that 


F,=(1+8;)-™ exp ($8:A/(1+8;)) (i=1, 2). 
We find the results shown in Table 812 where y;=1/(F;P). Judging by either 


TABLE 812 


RATIO (m or v2), FOR TWO DIFFERENT INITIAL DISTRIBUTIONS 
OF THE TAIL-AREA PROBABILITY (OF THE LIKELIHOOD-RATIO 
STATISTIC) TO THE BAYES FACTOR (F; OR F;), WHEN TEST- 
ING THE MEAN OF A MULTIVARIATE NORMAL DISTRI- 
BUTION OF KNOWN COVARIANCE MATRIX 








Py 


> 





1/4 


6 
4 
4 


1.9 


woe = 


5 
7 
3 
-5 
0 
3 
5 
2 
2 
-6 


2 
6 
9 
2 
6 
8 
0 
23 
25 
29 























F, or F; we may say that when P=0.1 the evidence against the null hypothesis 
is very feeble; when P =0.05 the odds of the null hypothesis have lost a factor 
of only 3; and when P=0.01 the final odds of the null hypothesis are at 
least ten to one against unless it was “odds on” initially. 

All the values of 7 in this example, which was selected after the main text 
was written, lie in the approved range (10/3, 30). 

As another example consider Lange’s data concerning the detection of 
criminality among “identical” (monozygotic) twins and non-identical (di- 
zygotic) ones, quoted by Fisher* 

Convicted Not convicted 


Monozygotic 10 3 
Dizygotic 2 15 


Fisher shows that the exact probability of getting as extreme a result, or a 
more extreme one, given the null hypothesis of no association, is 1/2150. 


4 Fisher, R. A., Statistical Methods for Research Workers, 7th Edition. Edinburgh and London: Oliver and Boyd, 
1938, 21. 
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Jeffreys‘ gets a factor against the null hypothesis of 171. Hence in this example 
we have y= 12.6. 

Other examples are given in effect in Appendix I of Jeffreys’ book. He be- 
lieves that Bayesian and tail-area-probability methods can lead to opposite 
decisions, but only rarely. This belief comes to much the same as saying that 
usually y=10%3, when 0.001<P<0.2, provided that the null hypothesis is 
initially about “evens.” 
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Ordinally invariant, i.e., rank, measures of association for bivariate 
populations are discussed, with emphasis on the probabilistic and 
operational interpretations of their population values. The three 
measures considered at length are the quadrant measure, Kendall’s 
tau, and Spearman’s rho. Relationships between these measures are 
discussed, as are connections between these measures and certain 
measures of association for cross classifications. Sampling theory is 
surveyed with special attention to the motivation for sample values 
of the measures. The historical development of ordinal measures of 
association is outlined. 
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1, INTRODUCTION 


HAT is meant by the degree of association or dependence between two 
W rendom variables with a joint distribution? For example, what is meant 
by the degree of association between scores on two intelligence tests with re- 
spect to the population of seventh grade students in the United States today? 
Again, what is meant by the degree of association between 1955 income from 
wages and age among English wage earners? 
Obviously the above questions do not have unique answers. There are 
infinitely many possible measures of association, and it sometimes seems that 
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almost as many have been proposed at one time or another. On the other hand, 
it has been argued that, except in special cases, it is fatuous to attempt to rep- 
resent the degree of association of a bivariate population by a single number. 
For example, see Guldberg [38]. 

The major purpose of the following remarks is to discuss the probabilistic or 
operational interpretation of several well-known measures of association, par- 
ticularly those that are ordinally invariant in a sense to be defined. Although 
discussion of sample analogs and their distributions will be included, em- 
phasis will be on the interpretation of measures of association supposing the 
population known. For it seems desirable to decide what we would mean by 
association if we knew the population of interest completely, before turning to 
the more complex questions of making inferences about measures of association 
from a sample. There is little point in estimating a population characteristic 
if the meaning of that characteristic is not clear. 

The approach of this paper differs from the standard textbook approach, 
which begins with a sample, defines some intuitively reasonable sample 
measure of association, investigates its distribution, and—sometimes as an after- 
thought, but more often not at all—finally asks about the underlying popula- 
tion quantity. 

Concern with the population measure is important for the applied statistician 
in other contexts than that of point estimation. For example, in testing the null 
hypothesis of independence via a sample measure of association, the kinds of 
dependence of most interest as alternative hypotheses should guide us in the 
choice of the appropriate test statistic, and understanding of the population 
characteristic estimated by the test statistic is often of great help. 

Of course, if a strong structural assumption, typically that of bivariate 
normality, is made, the situation is very different. The ordinary correlation 
coefficient or its close relatives are natural measures of association in the bi- 
variate normal case, and they have quite clear-cut interpretations. In this 
paper, however, I shall be concerned primarily with measures of association ap- 
propriate where strong structural assumptions are not present. That is, I shall 
be concerned with nonparametric measures of association. 

It is important to recognize that the question, “Which single measure of 
association should I use?,” is often unimportant. There may be no reason why 
two or more measures should not be used; the point I stress is that, whichever 
ones are used, they should have clear-cut population interpretations. 

Much of the paper assumes no component-wise ties, or continuity of the 
marginals of the bivariate populatiors of interest. But the case of discrete 
populations is also considered. Although the paper is largely self-contained, it 
would be desirable for the reader to have some acquaintance with M. G. Ken- 
dall’s monograph [51] and with the subject of measures of association in cross 
classifications as presented by Goodman and Kruskal [35] and [36]. 

To recapitulate: my purpose is to discuss operational interpretations for 
measures of association of bivariate populations, first supposing the popula- 
tions known and only later supposing them unknown, with inferences to be 
made by sampling. The measures to be discussed will nearly all be ordinally 
invariant and will not presume detailed structure such as bivariate normality. 








816 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1958 


While the purpose of the paper is to provide a unified exposition of material 
already largely known, some methods and results may be novel. 

The structure of the paper is as follows. Section 2 discusses as background 
the standard correlation ratio and correlation coefficient. Section 3 presents 
quadrant association and serves as a motivation for the sequence. Section 4 
presents Kendall’s 7. Sections 5 and 6 suggest two approaches to Spearman’s 
ps. Sections 7 and 8 discuss inequalities between the thrze measures. Sections 
10, 11, 12, and 13 consider estimation of the measures and questions of distri- 
bution for the estimators. Section 14 compares the measures. Sections 15 and 
16 extend the measures to general distributions, considering in particular wholly 
discrete cases (cross classifications), and relate the cross classification analogs to 
the no-ties estimators. Section 17 presents historical material, including a 
discussion of some little-known early papers by Lipps and Deuchler, and in 
passing mentions some related topics not covered earlier. 


2. THE CLASSICAL SECOND-MOMENT MEASURES 


The ordinary (Pearson, product-moment) correlation coefficient, and the 
correlation ratios, retain some interest even when an assumption of normality 
is absent. They do not form part of the central subject matter of this paper, 
but they should be mentioned briefly for the sake of completeness and contrast. 

Suppose, then, that we are given a specific bivariate distribution, expressed 
in terms of the pair of random variables (X, Y). A classical measure of associa- 
tion is the correlation ratio of Y on X, yx. This quantity may be defined in 
two different, but formally equivalent, ways. The first is 


Var Y — E[Var (Y¥| X)] 
Var Y ' 


i.e., nyx’ is the average relative reduction in the variance of Y, if we take Y 

conditionally on X given, over the unconditional or marginal variance of Y. 

The expectation in the numerator of (2.1) is with respect to X. The vertical 

stroke is that of conditional probability; I assume that the conditional distri- 

butions are defined. (For a detailed discussion of this point see Féron [30].) 
Second, 





(2.1) 


nyx* = 


, _ Var [E(¥| X)] 
i Var Y 


or the variance of the conditional expectation of Y given X, relative to the 
unconditional variance of Y. The numerator of the above expression is 
E[E(Y|X)—EY}?, the expectation of the squared deviation of E(Y|X) 
around the expected value of Y. 

The possible values of nyx* range from 0 to 1 inclusive. It is zero if and only 
if the conditional expectation of Y given X is the same for all values of X 
(strictly, almost all). It is unity if and only if the conditional distribution of 
Y given X is concentrated on a single point which may depend on X, i.e. if 
and only if Y is a function of X. 

The remarks of the above paragraph apply equally to nyx? and to its positive 
square root, nyx, the correlation ratio itself. One may also switch coordinates 





Nyx (2.2) 
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and define nxy analogously. Naturally, all the indicated moments are assumed 
to exist, and Var X and Var Y are assumed positive to avoid rather trivial 
special cases. 

The other classical measure of association is the correlation coefficient, pxy, 
which may be defined as Cov (X, Y)/./ Var X- Var Y. Interpretations of pry’, 
parallel to the above interpretations for the correlation ratios, may be given 
as follows. Let a+ 8X be the best linear estimate of Y given X in the least- 
square sense, i.e., E[Y—a—bX]? is minimized by a=a and b=8. Then 
B=Cov (X, Y)/Var X and a= HY —8EX.' It follows that 

Var Y — E[Y — a — BX}? 


pxy* = Var ¥ , (2.3) 





and that 


Var (a+ BX) Ela +pxX — HY]? 
pxy*® = = (2.4) 
Var Y Var Y 

The first of these equations, (2.3), says that pxy* is the average relative re- 

duction in the squared deviation of Y from its “best” linear estimate, relative 

to the marginal variance of Y. The second, (2.4), says that pry’ is the variance 

of the “best” estimate of Y based on X, relative to the variance of Y. A re- 

statement is that pxy’ is the average squared deviation of the “best” fitting 

straight line from the over-all average of Y, all relative to Var Y. Since pry is 

symmetrical in the two components, analogous relations with X and Y inter- 

changed also hold. pxy can take values from —1 to 1 inclusive. It is equal to 

+1 if and only if the joint distribution of X and Y is wholly concentrated on 

a straight line. It is equal to zero if X and Y are independent, but the converse 
does not in general hold. 

It was, I believe, first pointed out by Maurice Fréchet [31] and [32] that 
nyx’=p’y,mrix) and that 





PxY = NYXPX,E(Y|X) = 2XYPY,E(X|Y)- 


These relations follow quite directly from the above definitions, and it then is 
immediately clear that pxry* Snyx’ and pxy* Synxy’, with equality holding if and 
only if the corresponding regression is linear. 

An important distinction is that the n’s are invariant under rearrangement 
of the values of the ‘independent’ variable, while p is not thus invariant. More 
precisely, if Z=y(X), where y is one to one and measurable, then nyx =nyz. 
However the correlation coefficient between X and Y may be very different 
than that between Z and Y. A more obvious, but still important, distinction 
is that the correlation ratios are asymmetrical measures of association, while the 
correlation coefficient is symmetric. All further measures of association to be 
diseussed here will be symmetric. 

The above interpretations of p and the 7’s are in terms of expected or average 
squared deviations. But probabilities are more basic than moments, and the 
two bear only weak relations to each other unless strong structural assumptions 





} The same a and @ would result from minimization of E[Z(¥|X) —a —bXP. 
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are made or unless a number of moments are given. Hence, in my opinion, the 
above interpretations of the classical measures may not be suitable if we want 
to apply our measures to general distributions. Exception could be taken to 
this dogma in certain cases when loss functions proportional to squared de- 
viations exist. 

Another kind of general interpretation for the correlation coefficient may be 
stated in terms of common components. It stems, to the best of my knowledge, 
from a 1912 article by J. C. Kapteyn [49]. Suppose that the structure of X 
and Y is such that we may write 


X=Uj+Us+-->+Umt+Vit---+Va 
Y=U,+ U:++--+Un+Wit---+Ws, 


where the U’s, V’s, and W’s are all mutually uncorrelated and with the same 
variance. Then pxy is easily seen to be m/(m-+-n), or the proportion of common 
components between X and Y. One or two variations on this theme have been 
discussed more recently. This kind of interpretation seems to be useful only 
when it makes substantive sense to think of X and Y as having the above 
kind of overlapping additive structure, with the U’s, V’s, and W’s correspond- 
ing to quantities of substantive interest. 

There is an enormous literature on the correlation coefficient and the cor- 
relation ratios. Much of this literature discusses interpretation of these meas- 
ures of association only slightly or not at all. When interpretations are given, 
they are nearly always the same as those given above in terms of expected 
squared deviations. Uncritical use of the y’s and p has been rightly criticized 
by many writers, and from many viewpoints. In this connection, it is particu- 
larly worth mentioning an investigation of the use and misuse of p that was 
carried out under the leadership of M. Fréchet and sponsored by the Inter- 
national Statistical Institute. This investigation culminated in a 1935 article 
[33] containing comments by many eminent statisticians. 

Some measures of association reflect aspects of concordance (greater values 
of X go with greater values of Y), while other measures reflect aspects of 
connection that do not take the sense or direction into account. For example, 
pis a measure of concordance while 7 is a measure of connection. This distinction 
between connection and concordance, although perhaps difficult to make pre- 
cise, is a useful one to bear in mind. It has been strongly stressed by Corrado 
Gini in his many writings on the subject. The ordinal measures of association 
that we shall now discuss in detail all reflect aspects of concordance. 


3. QUADRANT ASSOCIATION AND THE QUANTITY 9 


Perhaps the simplest measure of association between two random variables 
is one directly related to the sum of the probabilities in the first and third 
quadrants of some natural Cartesian coordinate system. If we call the pair 
of variables (X, Y), and if we let (xo, yo) be some fixed value of (X, Y), then 
the quadrant measures of association are based on 


Pr{(X > ao and Y > yo) or (X < x» and Y < y)}, (3.1) 
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where “Pr” means: probability of. The above quantity may conveniently be 
written 


Pr{(X — 20)(Y — yo) > 0}; (3.2) 


it is simply the probability that the deviations of X and Y from 2» and yo 
respectively have the same signs, i.e., that (X, Y) lies in the first or third quad- 
rants around (20, yo). Note that our assumption of continuity implies that it 
is immaterial whether strong or weak inequalities are used in the above 
expressions, 

What should z» and yo be? Any choice is somewhat arbitrary, but a rather 
natural one is to take x») as the median of X, Med X, and to take yo as the 
median of Y, Med Y. If these medians are not uniquely defined, any median 
value may be used, since nonuniqueness of definition simply means that there 
is some interval for X (say) with zero probability, and such that X lies to the 
left of the interval with probability $ and to the right with probability 4. Hence 
any point in the interval, chosen as median, will give the same numerical value 
for (3.1) and (3.2). 

Using the medians as x» and yp we may consider 


co, = Pr{(X — Med X)(Y¥ — Med Y) > 0}, (3.3) 


or the probability that the deviations of X and Y from their respective medians 
have the same sign (¢ for sign; s for same). It is clear that o, takes values be- 
tween 0 and 1 inclusive. Itis 1 if and only if X Med X and Y— Med ¥ are posi- 
tive or negative together with probability one. It is zero if and only if X —Med X 
and ¥Y —Med Y have different signs with probability one. If X and Y are inde- 
pendent, ¢, is equal to 4 (but the converse need not be true). 

We may also consider 


og = Pr{(X — Med X)(Y — Med Y) <0}, (3.4) 


or the probability that (X, Y) lies in the second or fourth quadrant of a 
coordinate system with origin at (Med X, Med Y). Equivalently, og is just the 
probability that X—Med X and Y—Med Y have different signs. Clearly 
o,toa=1. . 

Using both o, and oa, a natural and more symmetric quantity is the quadrant 
measure 


? = o, — og = 20, — 1, (3.5) 


the difference between the probabilities of same and different signs for the 
deviations of X and Y from their medians. If and only if X and Y always de- 
viate from their medians in the same direction, 9 is 1; if and only if they always 
deviate in opposite directions, 9 is —1. If X and Y are independent 9 is zero 
(but the converse need not be true). 

(Typographical note: I shall use Greek letters for population quantities, and 
corresponding Latin letters for their sample analogues. The letter “q” is 50 
natural a symbol for (3.5) that I think it psychologically mandatory to use it, 
yet the standard Greek alphabet contains no “q”. Older Greek, however, does 
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have a “gq,” the letter koppa, which is here used and printed thus: 9. Although 
koppa was superseded in standard classical Greek by kappa, it remained in the 
Greek numbering system as a symbol for 90. It appears in several variant forms, 
some resembling the capital Latin “G” and some the lower case Latin “q”.) 

Thus 9 provides a measure of association that lies in the customary range 
—1 to 1, takes the value 0 in the case of independence, and takes its extreme 
values in well-defined extreme cases. But one should not interpret intermediate 
numerical values of 9 in the light of preconceptions about numerical values of 
other quantities that lie in the same range, for example the ordinary (product- 
moment) correlation coefficient. The interpretation of 9 is precisely that of the 
difference between two probabilities as stated above. In the bivariate normal 
case there is a simple relation between the correlation coefficient, p, and 9: 
p=sin [(/2) 9]. See, e.g., Cramér [12, p. 290]. Another name for 9 is “the 
coefficient of medial correlation” ; see Quenouille [71, Chapter 3]. It may also 
be noted that 9 is the ordinary correlation coefficient between sgn(X —Med X) 
and sgn(Y— Med Y). 

The meaning of 9 may be expressed in terms of the following example: 
suppose that we are concerned with the association between two “intelligence” 
tests with respect to a given population. Scores on the two tests correspond 
to the random variables X and Y. If we hypothetically choose an individual 
at random from the population of interest, then 9 is the probability that that 
individual’s two intelligence scores will both deviate from the respective popu- 
lation medians in the same direction, minus the probability that they will 
deviate in different directions. 

Another interpretation is the following: suppose that you are a commodity 
speculator, betting in effect on the price of wheat in December on the basis of 
its price in September. Suppose, further, that when the price of wheat is above 
its long-run September median you bet that it will also be above its long-run 
December median for that year; whereas if it is below its September median, 
you bet that it will also be below its December median. Finally, in this grossly 
simplified market, suppose that each year you either win or lose $1000 depend- 
ing on whether your bet on the December median turned out correctly or not. 
Then your expected or average income is $1000 0. 

Notice that 9, unlike the correlation coefficient, remains unchanged by 
monotone functional transformations of the coordinates: if, instead of X and 
Y, we consider f(X) and g(Y), where f and g are both monotone strictly in- 
creasing (or both strictly decreasing) then 9 is unchanged. If one of f, g is 
strictly increasing and the other strictly decreasing, then the value of 9 simply 
has its sign switched. Thus 9 is an ordinal (i.e., ordinally invariant) measure of 
association; the same will be true of the other measures to be next considered. 

The desirability of using ordinal measures for nonparametric work has been 
defended by some writers (see, for example, Wolfowitz [93, p. 104] and 
Hoeffding [40 and 41]), and most, but not all, statistical procedures that fall 
under the loose rubric of nonparametric analysis are invariant under wide 
classes of monotone transformations. (An important class of exceptions is the 
family of tests, first discussed by R. A. Fisher, based upon permutations of the 
observed sample). The consequences of using an ordinal measure of association 
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between two random variables might however be anti-intuitive in some con- 
texts. For example, consider the two distributions described graphically by 
Figure 821. 
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—»> xX >xXx 
| 36 | 








%O vO 


Distribution 1 Distribution 2 
Fig. 821 


(In Figure 821 the probability mass is thought of as smoothly spread over 
the indicated squares, so that they have the probabilities given by the attached 
fractions.) Both distributions would give rise to a value of 4 for the quadrant 
measure 9, although, if a metric were relevant, distribution 1 might seem 
intuitively to exhibit sharper association. Ordinal measures do have the ad- 
vantage that they may be estimated from a sample in which numerical obser- 
vations are lacking and one has only available the joint marginal orderings, 
or ranks. 

It is obvious that the quadrant measure 9 is not only ordinally invariant, 
but is also invariant under any transformations maintaining quadrant proba- 
bilities. For example, the two distributions described graphically by Figure 821a 
would both give rise to $ as the value of 9, yet destribution 4 certainly seems to 
exhibit higher intuitive association than does distribution 3. 
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The measures of association next to be described may be doubly motivated 
by (1) the above lack of sensitivity of 9 and (2) arbitrariness in the choice of 
Zo and yo required for quadrant measures of association. 


4. CONCORDANCE OF TWO “OBSERVATIONS” AND THE QUANTITY 7 


Perhaps the most natural way to avoid arbitrariness in the choice of zo and 
yo is to average Pr {(X —20)(Y¥—yo) >0} over all (xo, yo), with weights given 
by the joint distribution of (X, Y) itself. Or perhaps we might average 
Pr {(X—2x0)(Y—yo)>0}—Pr {(X—20)(Y—yo) <0}. This kind of averaging 
amounts to considering the probability 


MI, = Pr{(X: — X:)(¥: — Ys) > 0} (4.1) 


= Pr{ Xi > Xe and Yi - Y,} a Pr{X, < Xe and Y; <, Y;} 


and its complement 
Ie = Pr{(X: — X:)(¥: — Yx) < 0}. (4.2) 


Here (Xi, Y:) and (X2, Y2) are taken as two independent bivariate random 
variables, each with the bivariate distribution under consideration. In other 
words, II, is the probability that two hypothetical bivariate observations on 
the distribution of interest are concordant, in the sense that the two z coordi- 
nates differ with the same sign as the two y coordinates. IT, has a similar 
meaning but for discordance: different signs for the two differences.? Thus II, 
is just o, evaluated, not for (X, Y), but for the difference between (X, Y) and 
an identically distributed but independent bivariate random variable, i.e., 
evaluated for the distribution of (X,— Xe, Y1—Y:). This distribution will of 
course have both medians equal to zero. 

To avoid possible confusion, note that the two observations mentioned above 
are not two observations of a sample from which we might want to estimate 
a measure of association, but rather are hypothetical observations about which 
we are entitled to think apart from any real sampling situation. 

A convenient measure of association based on II, and Iq is 


1 =I, — Mg = 20, —1 = 1 — 2M, (4.3) 


the difference between the probabilities of concordance and discordance for two 
observations on the distribution of interest. 7 has, therefore, a direct and simple 
operational meaning. We also see that 7 is 9 for the distribution of (X,—X2, 
Y,—Y,), or the correlation coefficient between the signs of (X,—X,.) and 
(Yi—Y;). For this reason + has sometimes been called the difference sign 
correlation. Hoeffding has called it [44] the difference sign covariance (since 
Var [sgn (X,—X:)]=Var [sgn (Y¥i—Y:)]=1.) 

Several authors have independently proposed 7, or its sample analogue, as 
a measure of association; the basic notion seems to derive from G. T. Fechner’s 





? These definitions may be restuted as follows: (X1, Y1) and (Xs, Y:2) are concordant [discordant] as the line 
segment joining the two points has positive [negative] slope. It is immediate that concordance and discordance are 
ordinally invariant. The possibilities of sero or infinite slope may be neglected here, since marginal ties have, by 
assumption, sero probability. Later on the question of ties will be discussed. 
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work in 1897, and to have been first discussed in some detail by G. F. Lipps 
around 1905. The most recent independent proposal of r is that of M. G. 
Kendall in 1938. Kendall gave a very thorough discussion of 7 and its associated 
sampling theory; the measure is sometimes called Kendall’s r, More detailed 
historical remarks about this and the other measures appear in section 17 at 
the end of this paper. 

If the bivariate distribution is normal, 7 is related to the standard (Pearson 
product-moment) correlation coefficient, p, by the formula: p=sin [(x/2) 7}. 
This is a direct consequence of the analogous formula for 9, and the observation 
that the correlation coefficient between X,—X, and Y:1—Y; is the same as 
that between X, and Yi. 

From its definition, r is ordinally invariant. It lies between —1 and 1 in- 
clusive, taking +1 as its value if and only if all the probability mass lies on the 
graph of an increasing or decreasing function respectively. If X and Y are 
independent, 7=0, but the converse is not in general true. 

A rewording of the interpretation of 7 is the following. Suppose that obser- 
vations (X;, Y:) and (X2, Y:) are drawn but that only X,; and X; are revealed 
to us at first by some sort of mechanical device. Suppose further that we agree 
to play a game wherein we predict Y:< Y., when X,<Xo, and Y,>Y-_ when 
X,>X~¢. If our prediction turns out to be correct we win one dollar; if wrong, 
we lose one dollar. After prediction, the mechanical device reveals Y; and Y2 
and the pay-off is made. Our expected gain in dollars is r. 

We may consider another example. Suppose that we are interested in the 
degree of association between two “intelligence” tests for some very large 
specified population. Think of taking two individuals at random from the popu- 
lation and comparing their scores on the two tests. II, is the probability that 
the more “intelligent” according to one test is also the more “intelligent” ac- 
cording to the other. Iz is the probability that the orderings differ. And r is 
just the difference of these two probabilities. For this description to correspond 
perfectly with our prior more abstract discussion, it must be supposed that the 
population is infinitely large and that there is zero probability of ties. 


5. ANOTHER METHOD OF AVERAGING AND THE QUANTITY ps 


In the prior section we averaged Pr { (X—20) (Y —yo) >0} over all values of 
(xo, yo) weighted according to the distribution of (X, Y) itself. However, it 
might be reasonable to average with respect to the marginal distributions, 
taken independently. In other words, we might consider Pr {(X—X')(¥Y—Y”) 
>0} where X’ has the distribution of X, Y” has the distribution of Y, and 
where the pair (X, Y) and the two single variates, X’ and Y”, are all three 
independent. 

To rephrase this slightly, suppose that we take three hypothetical independent 
observations from the bivariate distribution of interest: (X1, ¥1), (Xo, Y2), and 
(X;, Ys). Then consider the probability of concordance between (X:, Y2) and 
the “crossed observation” (Xo, Ys), 


te = Pr{(X; — X:)(Y%i — Y3) > 0}. (5.1) 
(Clearly we could have used (X3, Y2) instead of (X2, Ys) without affecting the 
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above quantity.) Similarly consider the probability of discordance between 
(X1, ¥2) and (X2, Ys), «a, and subtract it from the probability of concordance 
to obtain 


te — t@ = Pr{ (Xi — X2)(¥i — Ys) > 0} — Pr{ (Xi — X;)(¥i — Ys) <0} 
= 2Pr{(X, — X:)(¥i — Ys) > 0} -1 = 2, —1. 


This quantity thus has a direct interpretation, although not quite so direct as 
r. Under independence between X and Y it is zero, but not conversely in gen- 
eral. However, its minimum and maximum values are —} and +4. These are 
taken on if and only if Y is a strictly monotone function of X, decreasing for 
the minimum and increasing for the maximum. 

To see this, note that all the quantities considered are ordinally invariant; 
hence we may perform probability integral transformations on the component 
random variables without affecting any of the above quantities. This means 
that we replace each X; by X;*=F(X,), and each Y; by Y,*=G(Y;,), where F 
is the cumulative distribution function of X, and G that of Y. When we do this, 
X,* and Y,* have marginal distributions uniform between 0 and 1. Then 


(5.2) 


te = Pr{ (Xi — X:)(¥: — Y;) > 0} = Pr{(Xi* — X,*)(¥,* — Y;*) > 0} 
= E[Pr{(X:* — X;*)(¥,* — Y;*) > 0| (X*, Yi") }] 
= E[xX,*Y,* + (1 — X,*)(1 — ¥,*)] (5.3) 
= 2E[X,*Y,*] = 2 Cov(X;*, Yi*) + 1/2 


= (2/12) (corr. coeff. bet. X,*, Y:*) + 1/2. 


Since the correlation coefficient can take all values between —1 and 1, and 
only those values, we see that «. can take all values between $ and 3. Hence 
2:e-—1 can take all values between —} and 3 and only those values. (The cor- 
relation coefficient between X,* and Y,* has been called by K. Pearson the 
grade correlation coefficient between X and Y.) 

In the above manipulations, the transition from the second to the third line 
follows from the fact that (X,*, Y;*) has the uniform distribution over the 
unit square. The symbol “Cov” means: covariance of. Further, we make use 
of the facts that the mean and variance of the uniform distribution over the 
unit interval are $ and 7 respectively. The vertical stroke in the second line 
is that of conditional probability. 

In order to shift our measure to a scale running from —1 to 1 in the con- 
ventional way, it is convenient to multiply by 3, thus obtaining finally 


ps = 3(t. — ta) = & — 3 = 3 — Grea. (5.4) 


This is the population analogue—in one natural sense at least—of the so-called 
Spearman rank correlation coefficient, whence the subscript S. ps is equal to 
the grade correlation coefficient between X and Y. 

Clearly pg is ordinally invariant. It takes the value + 1 just when all the prob- 
ability mass is on the graph of an increasing or decreasing function respectively. 
If X and Y are independent, ps is zero, but not conversely in general. 
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6. ANOTHER APPROACH TO pg 


Another point of view towards ps may be adopted that in some ways seems 
more natural. In particular, it obviates introduction of the arbitrary factor 3 
as in section 5. Suppose that we consider, as before, three hypothetical inde- 
pendent observations on (X, Y) and ask for the probability that at least one 
of the three is concordant with both the other two. Let us call this quantity 
w,, and call the corresponding quantity for discordance wg. Clearly w.+w¢=1. 
What is the relation between w, and ¢,? 

The three bivariate observations may occur in six different patterns, viewing 
them without regard to which is numbered observation 1, which 2, and which 
3. These patterns are given in Table 825. 


TABLE 825 


THE SIX DIFFERENT PATTERNS OBTAINABLE WHEN DRAWING 
THREF OBSERVATIONS (WITHOUT TIES) FROM A BIVARIATE 
POPULATION 








How many of six 
In chance equally likely 
Pattern Deseripti y per- event whose assignments of 
picture — mutation | probability | names lead to chance 
is w,? event 
(X,—Xs)(¥i—Y:) >0? 





All cone. 





Lower left conc. 
with two others, 
but two others 
not conc. 





Upper right conc. 
with two others, 
but two others not 
cone. 





Upper left disc. 
with two others, 
and two others 
conc. 





Lower right disc. 
with two others, 
and two others 
conc. 





All disc. 
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The first column shows the picture, and the second column gives its verbal de- 
scription. The third column gives a convenient description of the pattern in 
terms of the permutations of the integers 1, 2, and 3; let “1” be assigned to the 
dot with smallest y coordinate, “2” to the dot with next smallest, and “3” to 
the dot with greatest y coordinate. Then write down in order of z coordinate 
magnitude the three integers. For example, in the second line, the dot with 
smallest + coordinate is numbered 1, since it has smallest y coordinate; thus 
“1” comes first. The dot with middle z coordinate is numbered 3 since it has 
the greatest y coordinate, etc. This kind of description has proved quite 
useful. 

The fourth column simply says whether or not each pattern falls into the 
chance event: at least one (X;, Y,;) concordant with the other two. If we look 
at the number of inversions of neighboring integers necessary to achieve the 
primary ordering 1, 2, 3 for the y permutations of the third column, we see 
that a pattern falls into the above chance event just when its y permutation 
requires 0 or 1 inversions to achieve primary ordering. A pattern fails to fall 
in the above chance event just when its y permutation requires 3 or 4 inversions 
to achieve the primary ordering. 

Now for each pattern, there are six actual orderings of the three observations, 
for there are six ways of numbering the dots “1,” “2,” and “3.” Given the pat- 
tern, each of these six ways is equally likely, since the observations are in- 
dependent and identically distributed. The last column of Table 825 shows 
how many of these six ways lead to satisfaction of the chance event 
(X,—X2)(Y:1—Ys3)>0. For example, on the first line, either of the two number- 
ings in which the lower left point is (X,, Y,) satisfies this chance event. Also 
either of the two numberings in which the upper right point is (X,, Y;) satisfies 
it. But if the middle point is (X,, Y,), then (X,, Y:) (and (Xo, Y3) are dis- 
cordant. Hence 2+2+0=4 appears in the last column. 


It follows that 
- 4 2 
te = Pr{(X: — X:)(¥: — Ys) > 0} = gutye 
cay He) > 0.2) 
Tae Rte 
(w. + 1) 
3 We e 
Quite analogously, 
1 
Pr{ (Xi — X2)(¥: — Ys) < 0} = qe (6.2) 
so that 
Ps = We — Wa. (6.3) 


This is perhaps the most natural definition of ps: the difference between the 
probabilities (among three observations) that (a) at least one will be con- 











ORDINAL MEASURES OF ASSOCIATION 827 


cordant with the other two and (b) at least one will be discordant with the other 
two. 

In the case of a normal bivariate distribution, ps and p (the correlation 
coefficient) are related by the equation 


pe a Y (6.4) 
Le] 


This is easily derived as follows. If (X, Y) is a hypothetical observation from 
our normal distribution of interest and (X’, Y’) is an independent observation 
from the bivariate normal distribution with the same merginals as (X, Y) but 
with independent coordinates, then 


ps = 3Pr{(X — X’)(Y - Y’) > 0} — 3Pr{(X — X’)(Y — Y’) < 0}. (6.5) 


Without loss of generality, let X, Y, X’, and Y’ have zero means and unit vari- 
ances. The correlation coefficient between X and Y is p; that between X’ and 
Y’ is zero. Then X —X’ and Y—Y’ have a bivariate normal distribution with 
zero means, variances 2, and correlation coefficient p/2. It follows from our 
earlier discussion of the quadrant measure, applied to the distribution of 
(X-—X’, Y-—Y’), that 


Pr{(X — X’)(Y — Y’) > 0} — Pr{(X -— X’)(Y¥ - Y’) <0} 6.6 


2 
= — arcsin (p/2). 
T 


From this, (6.4) follows immediately. 

Interpretations of ps, similar to those given at the end of sections 3 and 4 for 
9 and 7, present no difficulty of statement, but are not so directly intuitive, 
since they involve three hypothetical observations, rather than one or two. 


7. RELATIONS BETWEEN T AND ps 


The probabilities of the six patterns of Table 825 must of course sum to unity; 
but there are other relations between them that give rise to inequalities be- 
tween 7 and ps. 

The first of these was pointed out by H. E. Daniels [14]; it is 


—1 S$ 3r — 29s S 1. (7.1) 


A direct proof follows. Denote by piss, pizs, etc., the probabilities of the six 
patterns of Table 825. Then 


II. = pros + ¥(pise + pas) + 3(psi2 + pes) 


(7.2) 
We = Pisg + Pis2 + Pais. 


The second part of (7.2) is immediate from Table 825; the first part follows by 
noting that the probability of concordance between (X,, Y:) and (X2, Y2), 
given some one of the six patterns, is just one-third the number of concordant 
pairs out of the three possible unordered pairs of points in the pattern. From 
(7.2) it follows that 311,—2a.=pies+psi2+pes1, whence 0S3II,—2,31, 
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whence Daniels’ inequality by substituting II,=(1+-7)/2 and w,=(1+ ps) /2. 
It is clear from the above that the right (left) side of Daniels’ inequality is 
achieved only when pi2s+psi2+p231=1 (0). 
The second inequality, or strictly pair of inequalities, was demonstrated by 
Durbin and Stuart [25]; it is 


1 + ps 2 (1 + 7)?/2 
1 — ps 2 (1 — 1)?/2. 
By making the substitutions mentioned at the end of the next-to-last para- 
graph, it is easily seen that (7.3) is equivalent to 
w, 2 Il.? 
wa = Il,?. 


(7.3) 


(7.4) 


A brief proof of the first part of (7.4) follows; the second part may be thrown 
back on the first by noting that w, and II, for the distribution of (X, — Y) are 
the same respectively, as wa and Il, for the distribution of (X, Y). 

Consider, then, three independent observations on the distribution of 
(X, Y): (Xi, Ys), i=1, 2, 3. Define the random variable 


Z = Z(X, Y:) = Pr{ (Xs, ¥2) concordant with (Xi, ¥:)| (Xi, ¥i)} (7.5) 


where the stroke is that of conditional probability. Z is the random version of 
(3.1). Since Var Z20, we have E(Z*) =(EZ)?* or 


E[Pr{(X:, Y:) concordant with (X:, ¥:)| (Xi, Y¥:)} 
Pr{(X3, Ys) concordant with (Xi, Y:)| (X:, ¥:)}] 2 (2£Z)*. 


Now EZ=II., and the left-hand side of (7.6), which we may call II.., is the 
probability that of three independent numbered observations, the second and 
third are both concordant with the first. Hence 


IIe 2 I”. (7.7) 


(7.6) 


Next, observe that 
We = Piss + Pise + Pus S> Pies + $pise + $pas = Te. (7.8) 


The cnly novelty in (7.8) is the right-most equality; this follows, as did (7.2), 
from an immediate examination of the conditional probabilities of the chance 
event whose probability is II.., given each of the six patterns of Table 825. 
Finally, putting (7.7) and (7.8) together, we have the desired first part of (7.4). 

Note that equality of this first part of (7.4), i.e., equality of w, and II,” requires 
that Z(X, Y)=T1, identically, except for a set of (X, Y) values having zero 
probability, and also that piss = pos = 0. 

It is interesting to exhibit the restrictions of (7.1) and (7.3) graphically. 
Figure 829 is a graph, in the (7, ps) plane, showing values of (r, ps) that are 
precluded by (7.1), (7.3), or both. 

From the following figure, or by the corresponding algebraic manipulations, 
we see that (7.1) and (7.3) together give 


—1+ (1 +1)?/2 S ps S (1 + 37)/2, 130 (7.9n) 
(—1 + 3r)/2 S ps S1—(1—7)7/2, 120 (7.9p) 
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Consider distributions of the kind indicated by Figure 829a. Here all the mass is 
concentrated continuously on two monotone curves as shown, the upper left 
one having probability a in toto, and the lower right one having probability 
1—a. 0SaS1. For such a distribution one easily computes 


II, = a? + (1 — a)’, w, = a? + (1 — a)’, 
7 = 4a? — 4a + 1, ps = 6a? — 6a + 1, 


whence 





l—r 1— sg 
2 
or 
ps = (—1 + 3r)/2. 


Since 0Sa3Sl, r (in this case) runs from 0 to 1 inclusive. Hence, for r20, the 
left side of (7.9p) may be achieved. Similarly, considering distributions of the 
form of Figure 830, for r $0, the right side of (7.9n) may be achieved. In short, 


AY 





«V 





Fic. 830 


the straight line boundaries of the unshaded region in Figure 829 are the best 
possible. 

As to the parabolic boundaries, the situation seems unclear. They are 
achieved for special values of 7, namely values of + of form +(1—(2/m)), 
m=1, 2, 3, - - -. Consider, for given integral m, a distribution of the form of 
Figure 831, in which the probability mass is entirely on m monotone decreasing 
curve segments as shown, the segments themselves falling in a sequence of 
concordant rectangles. (In Figure 831 the dashed lines are guides to the eye 
only.) Each segment has probability 1/m spread on it in any continuous man- 
ner. Clearly 


IT, = 1 — Og = 1 — m/m? = (m — 1)/m 


@. = 1 — wa = 1 — m/m* = (m? — 1)/m? 
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so that 
t=1—(2/m), ps = 1 — (2/m*) 
and 
1 — ps = 2/m? = (1 — 1)?/2. 


Thus, for r=0, 3, 4, 2, 3, ¥,---, 1—(2/m), ---, the right side of (7.9p) 
may be achieved. A similar argument shows that for r=0, —}4, —},--- 








Fig. 831. (Case m=5). 


—1+(2/m), - - -, the left side of (7.9n) may be achieved. So far as I know, 
the question of best bounds for ps, when 7 has other values than the above, is 
unresolved. 


8. RELATIONS BETWEEN 9 AND T, AND BETWEEN 9 AND ps 


Suppose that 9 has some given value; how large and how small can r be? It 
is easy to see that, if 9 is given, the first and third quadrants around (Med X, 
Med Y) must each have probability (1+9)/4, while the second and fourth 
quadrants must each have probability (1—9)/4. In order to make II, (whence r) 
as large as possible we clearly want the mass concentrated on monotone in- 
creasing curves within each quadrant. A further argument, based on the two 
“observations” entering into the definition of II., when they fall in different 
quadrants, shows that II, is maximum when the joint distribution has the gen- 
eral appearance of Figure 832. 

For such a distribution Ilz=2[(1—9)/4]? =(1—9)*/8 so that r=1—(1—9)?/4. 
A similar argument shows that the minimum value of 7 is (1+9)*/4—1. Hence 
we have the inequality 


(1+9)?7/4-1lsers1—(1—9)?/4 (8.1) 


which is best possible. Figure 832a shows the relationship between 7 and x 
expressed by (8.1). 
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Fic. 832a. Inequalities between 9 and r. Shaded area excluded. 
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Fia. 833. Inequalities between 9 and ps. Shaded areas excluded. 


Now for similar work as to 9 and pg. A little reflection shows that a distribu- 
tion like that of Figure 832 maximizes ps for given 9. For such a distribution 


wa = 6[(1 — 9)8/64] 
so that 


3 
= 1 — — (1 — 9)3. ' 
ps 16 ( ?) 
If the complementary computation be made, we find that 
3 3 
—(1+9)'-1 1 — — (1 -— 9)’ 8.2 
6° ?) Spss rh ?) (8.2) 


gives the best possible inequality between 9 and ps. Figure 833 exhibits it. 
9. BIVARIATE NORMAL DISTRIBUTIONS AND A ONE-PARAMETER 
NON-NORMAL FAMILY 


It may be worth recapitulating the values of 9, r, and ps for bivariate normal 
distributions. They depend only on the correlation coefficient, p, and we have 
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shown that 


2 
9=7r = —sin-'p (Sections 3 and 4) 
rT 


6 
od sin (p/2) (6.4) 


Graphs of these two functions appear in Figure 834 for non-negative values 
of p. For negative values of p, only the signs of 9, 7, and ps are changed. 
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Fig. 834. Relations between p, 9, r, and ps for bivariate normal distributions. 


Ay 
(6,1) 


(0,1) NS 1,1) 


(8) Vv ie 
\ 


(0,0) (0) (40) Xx 




















Fia. 834a. Distribution of kind discussed in section 9 Density function is 
1/(@ +(1 —@)*] in shaded area, and zero elsewhere. 
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Further insight into the nature of 9, 7, and ps may be obtained by computing 
their values for the members of a simple one-parameter family of distributions. 
As an illustration consider the one-parameter family of distributions within the 
unit square for which the probability mass is uniformly spread within the two 
squares ((0, 0), (0, 0), (6, 4), (0, @)) and ((6, 6), (1, 4), (1, 1), (, 1)). Here 
05631. Such a distribution is pictured in Figure 834a. If we let A (6) =@?+ (1—6)? 
then it is easily computed that the marginal density function of both X and Y 
is 


6/A(8) 0S z(ory) S80 
(1 — 6)/A(6) 6S 2z(ory) S1 (9.1) 
0 elsewhere 


Further Med X=Med Y=1—[A(@) V/ [2(1—0)] when @S} and A(@)/(20) 
when @2 }. It is then readily computed that 


43 — —6? 683 0.2) 
(1 — 6*)/e? 6 =} 

r = 26%(1 — 6)*/A%6) (9.3) 

ps = 307(1 — @)?/A%(0) = $r. (9.4) 


Graphs of these functions (ail symmetrical about @ = 3) are given in Fig. 835. 





OTR, 














Fia. 835. Graphs of 9, 7, and ps as functions of @ for family of distributions 
discussed in section 9. 


10, ESTIMATION OF THE QUADRANT MEASURE 


Suppose now that we have a random sample of n, (Xi, Y1), (Xo, Y2),---, 
(X,, Y.), from a bivariate distribution with continuous marginals, and that we 
wish to estimate the quadrant measure ? for this distribution. The natural 
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procedure is to find the sample medians and to let the estimator be 


1 
q = — [No. (X,, Y,)’sin first and third quadrants around sample medians | 
n 
(10.1) 


1 
— — [No. (X;, Y,)’sin second and fourth quadrants around sample medians | 
n 


If n is even, (10.1) defines a sample statistic unambiguously. If n is odd, some 
slight modification is needed, since one or two sample points will have their z 
or y values exactly equal to the corresponding medians. 
Blomqvist [10] has made the definition precise and has discussed the distri- 
bution of g. His method of making (10.1) precise is the following: 
If n is even, let mn, and nz be the integers in the two square brackets of (10.1) 
respectively. 
If n is odd, and (X,,, Y,,) furnishes the sample median for both z and y, neglect 
(Xm, Ys) in counting the quadrant numbers, n; and nz. 
If n is odd and the sample medians for z and y correspond to two different (X;, Y;)’s, 
count these two as one point only, and assign them to the quadrant touched by both 
points for purposes of computing m; and nz. 


A simple sketch for n =3 will illustrate the above definition of n, and m2. Then 
q=(ni—N2)/(m: +2). (Note on terminology: Blomqvist uses “g” for our “9” 
and “q’” for our “g.”) 

Blomqvist showed that, under mild regularity conditions, 


- wa 

Fhe (10.2) 
v1 —-9? 

is asymptotically unit-normal. From this result, approximate confidence in- 

tervals for 9, or tests of the null hypothesis that 9 is some specific value, may 

readily be obtained. Even more simply, but probably less accurately, 


—- q-? 
n poe ceewam am 
vi-@ 
is approximately unit-normal for large n, thus permitting very simple approxi- 
mate tests and confidence interval procedures. 

An exact test that X and Y are independent is readily obtained on the basis 
of g. Blomqvist provided tables for its use, and showed that, in the bivariate 
normal case, its asymptotic efficiency is 41 per cent of the test based on the cor- 
relation coefficient. Konijn [53] corrects Blomquist’s statement of regularity 
conditions and discusses asymptotic power. 





(10.3) 


11, ESTIMATION OF Tt 


The natural estimator of II, from a random sample of n, (X:, Y:),---, 
(Xn, Yn), is simply the relative frequency of concordant pairs of observations. 
As before, we assume continuous marginals. 
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Let us denote this estimator by 
P.« Number concordant (X;, Y,;), (X;, Yj), t#j 
Pie n(n — 1 
wie (11.1) 
7 Number concordant (X;, Y;), (X;, ¥;), i<j 
5 in(n — 1) 


where the second form uses the fact that a pair of bivariate observations need 
only be considered in one order. 

It is sometimes useful to present P, in terms of a score, as follows. For each 
unordered pair of observations, ((X;, Y;), (X;, Y;),) with 1#j, score 1 or 0 
according as concordance or discordance obtains. Then P,=sum of scores 
divided by n(n—1). 

An equivalent and often convenient rewording is readily given in terms of 
ranks. Replace the X,’s by the numbers 1, 2, - - - , n (lowest X; replaced by 1, 
next lowest X; by 2, and so on). Similarly replace the Y,’s by their ranks. If we 
compute P, with (X;, Y;) replaced by (rank X;, rank Y,), P,. is unchanged, 
since two observations are concordant if and only if they are concordant after 
their components are replaced by ranks. 

We may write down, in order, the rank of that Y; whose X; has rank 1, 
followed by the rank of that Y; whose X; has rank 2, etc. A table of the fol- 
lowing form is obtained: 























X; ranks 11|2);3 4 n 
Corresponding Y; ranks| S, | S:| S3| S, -- |S, 
where the S,’s form a permutation of 1, 2, - - - , n. Consider now concordances 


among pairs of observed ranks one of whose members is (1, S;). This number 
will range from 0 (if S:=n) to n—1 (if S:=1). We may consider it as obtained 
by scoring a 1 or a O for each of the pairs considered, according as concordance 
or discordance holds, and then summing the scores. Having finished with pairs 
of observations that include (1, S;), we turn to those that include (2, S.) and 
exclude (1, S;). Scoring as before, we obtain a sum ranging from 0 to n—2. 
Continue in this way, and add up all the scores. We obtain a number ranging 
from 0 (when S;=n—1) to (n—1)+(n—2)+(n—3)+ - - - 1=4$n(n—1) (when 
S;=1). Dividing the total score by 4n(n—1) we obtain P,, a number ranging 
from 0 to 1. The computing procedure described above is often the simplest. 
Note that the ranks do not enter intrinsically at all, since the same operation 
may be performed on the ordered observations. However, the use of ranks sim- 
plifies the comparisons. 

Similarly, one may score for discordance and obtain P4, the natural estimator 
of IIz. Then #t, the estimator of 7, is P,— Pa, or, equivalently, 2P,—1=1—2Pa4. 
Again, equivalently, we may score each of the 4n(n—1) unordered pairs 1 or 
—1 according as concordance or discordance obtains. To obtain ¢, divide the 
total resulting score by $n(n—1). 

A clever graphical method of computing ¢ has been suggested by 8. D. 
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Holmes [75, Appendix B]. I am indebted to Harold D. Griffin for calling this 
to my attention. Griffin discusses the method [37]. 

It is interesting to note that (3)P.z is just the minimum number of inver- 
sions of neighboring S,’s necessary to permute the S,’s into the order 
1, 2,---,m. For (a) the inversion of a discordant pair of neighbor S,’s de- 
creases (3)Pa by 1, and also decreases the number of necessary subsequent 
inversions by 1. And (b) the greatest value of (3)Pa and the greatest number 
of necessary inversions (both occurring when S;=n—7) are $n(n—1). 

The exact distribution of ¢ in the case of independence may be computed 
recursively; the most extensive tabulation is that of Kaarsemaker and van 
Wijngaarden [48]. The distribution of ¢ in the general bivariate normal case 
has been investigated by E. F. Fieller, H. O. Hartley, and E. S. Pearson [30a], 
using empirical sampling methods. In the same article the inverse hyperbolic 
tangent function is considered as a normalizing and variance-stabilizing 
transformation. 

It is immediate from its definition that Hi=r. The variance of ¢ is not hard 
to cormpute in terms of a quantity already introduced in section 7, 

II. = Pr{ of three independent observations, the second and third (11.2) 


are concordant with the first}. 
One finds (see, e.g., Hoeffding [43]), 


8 1 n-2 
Var ¢ = ————— II,(1 — I.) + 16 — ——— (I... — I’). (11.3) 
1 nn-1 


n(n — 1) 


Note that as n—©, this quantity times n has the limit 16(I..—II,”). It may 
be shown (Hoeffding [43] and [44]) that, unless II.,= I’, 
VE) NO, 1) (11.4) 
4V/ Pree +g Pe 
in distribution, where P., is the sample analogue of II., just as P, is that of 
Ii.. The meaning of (11.4) is that the probability that the quantity on the left 
lies in any fixed interval has as its limit (n— ©) the probability of that interval 
under the unit-normal distribution. 

The condition II,,.~12 means (see section 7) that the probability that ob- 
servation 2 is concordant with observation 1, given observation 1, is not essen- 
tially constant. It will be satisfied except for rather unusual distributions such 
as those used as examples in section 7. If a genuine density function exists, the 
condition will be satisfied. 

An upper bound for Var ¢, suggesting “conservative” simple approximate 
tests and confidence intervals, has been given by Daniels and Kendall [15]: 
Var t32(1—7*)/n. This inequality follows by direct substitution from the 
relationship 211.,.S1,.+02. And this relationship, in turn, may readily be 
demonstrated, for the no-ties random sample situation that we are discussing, 
as follows. Consider four independent observations on the distribution of 
interest: (X;, Y,), i=1, 2, 3, 4. Let Wi;=1 or 0 according as (X;, Y;) is con- 
cordant with (X;, Y,;) or discordant. Then W;;=W;;,, EW.;=11., Var 
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W,=01.—U’, and Cov (Wi;, Wx) =T..—I12 (for jk) by direct computation. 
Next compute 


Var [2(Wis + Wu) — (Wis + Wis + Wes + Wu)] = 0 


to find that II.+Il,?—2II.,20, the desired result. 

To use ¢ as an approximate test of independence we note that, under in- 
dependence, IT,.=5/18 and Il,=1/2. Hence I,.—l2=1/36 and ./n 3t/2 is 
approximately unit-normal for large ». Apparently, a more accurate test is 
obtained by using the exact variance of /nt under the hypothesis of independ- 
ence, 2(2n+5)/[9(n—1)]; that is by taking 


- 9(n — 1) 
Vn i —— t 
2(2n + 5) 
as approximately unit-normal under the null hypothesis. 

The asymptotic efficiency (Pitman sense) of this test in the normal case, as 
contrasted with that based on the correlation coefficient, is 9/’&.91. 

A variant of t has been suggested by Whitfield [90] as appropriate for cases 
where the distribution is known to be symmetric, in the sense that (X, Y) has 
the same distribution as (Y, X). This situation is analogous to one-way model 
II analysis of variance with two observations per group, when viewed from the 
intra-class correlation standpoint. The essential point is that (X,, Y,) and 
(Y;, X,) are equally good observations on the distribution of (X, Y). Whitfield 
suggests, in effect, that all the 2" possible values of t be averaged as an estimate 
of 7, and he provides a shorter way of carrying out the computation that does 
not require the evaluation of many ?’s. In addition he discusses and tabulates 
for n =6(2)20 the distribution of the resulting average ¢ under the hypothesis of 
independence. 


12, ESTIMATION OF ps 


Perhaps the simplest estimator of ps may be motivated by using the defini- 
tion of ps in form 


ps = 6Pr{ (Xi, ¥:), (Xs, Ys) concordant} — 3 
= 6, — 3. 


In one sense the natural estimator of the joint distribution of (X,, Y;) is that 
discrete distribution putting probability mass 1/n on each (X,, Y,) of the sam- 
ple actually obtained. Similarly, one natural estimator of the distribution of 
(X2, Ys), that is the bivariate distribution with independent coordinates and 
the same marginals as (X,, Y;), is that discrete distribution putting mass 1/n? 
on each of the n? points (X;, Y:) formed from the sample that is actually 
obtained. 

If we adopt this viewpoint, a fairly reasonable estimator of ..=Pr {(X:, Yi), 
(X:, Ys) concordant} is obtained by computing the corresponding quantity 
for the two discrete distributions described above. But here we very definitely 
do not. have continuous marginals, and there are several ways one might 
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proceed. I choose one variation that leads to the conventional sample value of 
Spearman’s rank correlation coefficient. 

Consider any one of the actual observations, (X,;, Y;), and obtain from this 
a partial estimate of «.=Pr { Xi, Y:), (X2, Ys) conc.} as follows. 

There are n?—1 points of form (X;, Yx), excluding the point that we are 
working with itself. If Rx, is the rank of X; among the X’s and Ry, the rank of 
Y; among the Y’s, then, out of the n?—1 points (X;, Y:), exactly (Rx,—1)X 
(Ry,—1) will lie below and to the left of (X;, Y,). Similarly exactly (n —Rx,) X 
(n—Ry,) will lie above and to the right of (X;, Y;). These relations are easy to 
picture; Figure 840 shows a typical sample with n=8. The heavy dots are the 


fet 


Fia. 840 


observed points, and through them horizontal and vertical lines parallel to the 
axes have been drawn. The heavy circled dot is (X;, Y,). In the pictured case 
Rx,=4 and Ry,=3. There are (4—1) (3—1) =6 intersections below and to the 
left of the circled dot; there are (8—4) (8—3) =20 intersections above and to 
the right of the circled dot. 

The total number of intersections concordant with (X;, Y,) is, therefore, in 
general 


(Rx; — 1)(Ry, — 1) + (n — Rx,)(n — Ry,) ist 
= 2RxRy, — (n+ 1)(Rx, + Ry) tn?+1. 


But this count is unfair, for there are 2(n—1) intersections on the horizontal 
and vertical lines through (X;, Y;,); these are tied in exactly one coordinate 
with (X;, Y;). How should we take these into account? It seems reasonable 
to count them at half their number, on the grounds that they lie exactly poised 
between concordance and discordance. Thus the total number of intersections 
concordant with (X;, Y;,), plus half the number tied in one coordinate, all di- 
vided by n?—1, the total number of intersections excluding (X;, Y,), is 


eae = ; {2Rx,Ry, — (n+ 1)(Rx, + Ry.) + n(n +1}. (12.2) 


Next, average this over the n (X;, Y;)’s to obtain 
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1 
1) {2>° RxRy, — n(n + 1)? + n(n + 1)} 


n(n? — 


: (12.3) 
a year {20 RxRy, — n(n + 1}. 


This is a fairly reasonable estimator of «.; we might call it 7,. Now take 6 times 
i, and subtract 3, to obtain, as an estimator of ps 


rs = onset {12 >> Rx,Ry, — 6n(n + 1) — 3n(n? — 1)} 
n(n? — 1) 
(12.4) 


1 
arretee {12 >> Rx,Ry, — 3n(n + 1)*}. 


This is the famous Spearman sample rank correlation coefficient. Two more 


usual forms of it are 
1 n+1 n+1 
ci , mT. 
n x( a )( sale ) (12.5) 


(n? — 1)/12 





fs = 


or the ordinary correlation coefficient computed in terms of the ranks; and 
F Dd (Rx; — Ry,)? 
n(n? — 1) é 





rgs=l1- (12.6) 
a form often convenient for computation. The equivalence of the above three 
forms is easily verified by elementary algebraic manipulation. 

The above motivation for rs contains a number of arbitrary elements, for 
example the manner of counting intersections tied in one coordinate with 
(X;, Y,). Let us now consider a less arbitrarily motivated estimator for ps; 
this turns out to be not rs, but rather a linear combination of rs and t. However, 
for large n it becomes indistinguishable from rs. 

This more natural estimator is approached just as we approached the 
estimator for r. We begin with the definition ps =2w,—1, and ask about esti- 
mation of we. 


13. A MORE NATURAL ESTIMATOR OF ps AND FURTHER REMARKS ABOUT fs 


A natural estimator of w,, w., may be obtained by looking at all (3) un- 
ordered sample triplets, ((X;, Y:), (Xj, Y;), (Xz, Yx)), with no tied subscripts, 
and scoring 1 or 0 according as there are <1 inversions of rank order or 22 
such inversions. Then compute w,=total score/(3). In order to bring this 
into more explicit form, adopt the following device. (I assume n 23, so that this 
approach has meaning.) 

Count all concordances between the n(n—1)(n—2) pairs 


(Xi, Y.), (Xs, Ys) t#j,t Ak jk. (13.1) 
We know, from Table 825, that each triple scoring 1 from the above paragraph 
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contributes 4 to this count, and each triple scoring 0 from the above paragraph 
contributes 2 to this count. If the count suggested here yields c, say, then 


«(*) +2(")a- ) (13.2) 
c 3 We 3 We e 


n c 
e/( ) = 2w. + 2; Uw, = -— 1. (13.3) 
3 n 


(3) 


Now ask how much (X;,, Y;) contributes to the count c. If we did not have the 
restriction jk it is clear (just as before) that the number of (X;, Y:) points 
below and to the left of (X;, Y;) would be 


(Rx, = 1) (Ry; a 1) (13.4) 
where Rx, and Ry, are the ranks of X; among the X’s and of Y; among the Y’s. 


Similarly the number of (X;, Y;) points above and to the right cf (X;, Y;) 
would be 


(n — Rx,)(n — Ry,). (13.5) 
Hence the contribution of (X;, Y,;) to cis 
(Rx, —1)(Ry, — 1) + (n — Rx,)(n ~ Ry,) (13.6) 
— no. (X;, Y;)’s concordant with (X;, Y), 


where the last term subtracts off the correct integer to take account of the 
restriction 7~k. We obtain then 
2Rx,Ry, — (n+ 1)(Rx, + Ry) +n? +1 (13.7) 
— no. (X;, Y;)’s concordant with (X;, Y;). 
If we sum over i, we obtain 
c=2)>> Rx Ry, — n(n+ 1)? +n? +n — n(n — 1)P. (13.8) 
= 2 >> Rx,Ry, — 2n? — n(n — 1)P. 


since ) -Rx,= > Ry,=n(n+1)/2, and since each concordant pair is counted 
twice in the last subtractive term. 
Hence 


1 n? 


Ww. = > Rx Ry, — —1 


(:) (ie 


1 
6 (n + 1)(n + 2) 3 


(3) 
> RxRy, 


~ n(n — 1)(n — 2) ‘"(—-Din-2 n-2 


{X Retr, — nt - nin — 1)(n - 2)} _ : P, 
6 n—2 








c- 
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Next, to estimate ps, consider 2w,— 1, and—for the sake of symmetry—replace 
P, by (t+1)/2. We find 
12 2(n + 1)(n + 2) 


oe aw S X Bar, - (n — 1)(n — 2) 











(13.10) 
1 
~ n(n = 1)(n —2 
n+1 3 
Tr 


n—2 4 n—-2- 





{12> Rx,Ry, — 3n(n + 1)*} — fle Ws 
) n—2 





Note that, for large n, this is virtually the same as rs. 

Note also that, since the probability is w, that a random triple of sample 
points shows 0 or 1 inversions in rank order, EW.=,, and consequently the 
expected value of the above estimator is ps. Hence 


n+1 3 
tl, 


(13.11) 


(13.12) 


This shows that rs is in general a biased estimator of ps, but that the bias 
rapidly goes to zero as n grows. 

The ps estimator 2w.—1=[(n+1)/(n—2) |rs— [3/(n—2) ]t has two advan- 
tages when compared to rg: it is more naturally motivated and it is unbiased. On 
the other hand, r, is more easily computed from a sample and is much more 
commonly used. Konijn [53] has called 2w,—1 the unbiased grade correlation. 
It, together with 7s, has been studied carefully by Hoeffding [44], where 
general expressions for the variances of both quantities are given. 

If X and Y are independent, Var rg=1/(n—1) and Var (2w,.—1) = (n?—3)/ 
[n(n —1)(n—2) ]. 

If X and Y are jointly normally distributed, the variance of rs may be 
approximated by the first terms of a series expansion; see Kendall [51, 
second ed., p. 130]. The distribution of rg in the bivariate normal case has been 
investigated by Fieller, Hartley, and Pearson [30a], using empirical sampling 
methods. In the same article the inverse hyperbolic tangent function is con- 
sidered as a normalizing and variance-stabilizing transformation. 

For a test of independence, rs may be used as a test statistic. Its exact dis- 
tribution under the hypothesis of independence has been tabulated for n 
through 8 by M. G. Kendall et al [52] and for n=9, 10 by S. T. David et al 
[16]. (See also Olds [65] and Thornton [84]. The distributions are tabulated 
in Kendall’s monograph [51].) Hotelling and Pabst [46] have shown that, 
when independence holds, 


Vnrs (13.13) 
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is asymptotically unit-normal. The more general asymptotic distribution of 
rs has been discussed by Hoeffding [44]. 

When independence holds, M. G. Kendall has found that a reasonable ap- 
proximation to the distribution of rs may be obtained by treating 


(/ n—2 

r 

‘ 1- rs? 

as distributed according to Student’s ¢ distribution with n—2 degrees of free- 

dom. A table of one and five per cent two-sided critical values, for n=5(1) 

40, based on this approximation, is given by Litchfield and Wilcoxon [62]. 
The asymptotic efficiency (Pitman sense) of the test of independence based 

on rg, in the normal case, is 9/7*&.91 relative to that of the test based on the 

correlation coefficient. This is the same value as that for the test based on ¢. 


14. RELATIVE MERITS OF 7 AND ps, t AND T's 


As between rs and ¢ qua estimators, they are not really in general competi- 
tion, for they are estimators of different population quantities. It may be noted 
that 7 is simpler to interpret than ps. Some authors consider it important that 
rs is easier to compute than ¢. 

As between rs and ¢ qua test statistics for a test of independence, it may be 
pointed out that they presumably have power against different sorts of alter- 
natives to the null hypothesis of independence. A rather extreme example is 
the population in which the mass is spread smoothly along two curves as in 
Fig. 830, with a=}. Here r=0 and ps=}. If populations akin to that pictured 
were the important alternatives to independence, rs would undoubtedly be 
a better test statistic than ¢. 

Konijn [53] presents an interesting and novel approach to the power of tests 
for independence based on gq, t, rs, and w,. In this approach the family of 
alternatives is taken to be those bivariate distributions derived by linear trans- 
formations from bivariate distributions with independent marginals. Konijn, 
in paper to appear in Sankhya, has considered another interesting restricted 
family of bivariate distributions. Another approach to power has been sug- 
gested in a recent paper by Barton and David [5]. 

In the bivariate normal case, the estimators of p based on ¢ and rs have a 
correlation approaching unity as n>. 

It is clear that many other ordinal measures of association might be proposed. 
For example, one might base such a measure on the probability that all pairs 
out of three observations are concordant. Or one might base a measure on II;<. 
And so on. 

On the grounds of simplicity of interpretation, reasonable sensitivity to 
form of distribution, and relative simplicity of sampling theory, I prefer the 
use of 7 and ¢ to that of ps and rs. 


15. GENERAL DISTRIBUTIONS, AND ANOTHER APPROACH 
TO t IN THE NO-TIE CASE 


If the assumption that X and Y have continuous marginal distributions be 
dropped, then many of the manipulations presented or suggested in the preced- 
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ing sections become false or incompletely defined. For then ties in one co- 
ordinate or the other may occur with positive probability. However, 7 and ps 
may still be defined. 

All we need do, in the case of 7, is to redefine II, and Ig as the probabilities 
of the same chance events as before, but now conditionally on the absence of 
ties. Thus, we need only set, for example, 


I, = Pr{X; — X2)(¥: — ¥2) > 0| X: ¥ Xz. and ¥,# Y2}. (15.1) 


This quantity may be considered as well defined in general; the only exceptions 
are the degenerate cases in which the entire distribution is concentrated on a 
horizontal or vertical line. The definitions hold perfectly well in the continuous 
case earlier discussed, for then the probability of the condition is unity. 

What is the natural sample estimator for r in the general case? Let us look 
first at that for II.. As before, we start by observing the relative number of 
sample pairs for which concordance obtains. But now we must divide by the 
relative number of sample pairs satisfying the no-ties condition. This suggests 
the quantity 

no. concordant sample pairs 


P, = - - . (15.2) 
no. no-tie sample pairs 





To say that the pair of observations (X,;, Y,;), (X;, Y;) (¢#) is a no-tie pair 
is to say that X;~X,; and Y;+Y;. The analogous definition for P, is clear, and 
so t may be defined for the general case as P,— Pa, just as before. 

The above variant of ¢ for tied observations, or some slight modification of 
it, has been suggested occasionally in the literature, perhaps first by Deuchler 
(see section 17). A recent discussion of it is presented by Adler [1]. For ap- 
proaches to ¢ when ties are present, using other denominators than that of (15.2), 
see Kendall [51, Chapter 3 and the references there given ]. 

A particularly interesting kind of noncontinuous distribution is that in 
which X and Y can each take only a finite number of values. Without loss of 
generality (because of ordinal invariance) we may suppose that X takes values 
1,2,---+,aand Y takes values 1, 2, -- - , 8. The joint distribution of X and 
Y may then be described by the of probabilities pa (a=1, ---a,b=1,---, 8), 
the probabilities that X=a and Y=b. This is just the cross classification 
situation of [35] with order in both classifications. 

The probability of a tie in one or both coordinates when taking two observa- 
tions is 


= DY pe? + Doe? — DD pas’. (15.3) 


Here pa. = Ys pes aNd p.s= >~a pas, the marginal probabilities that X =a and 
that Y =b respectively. It is Nuss shown that 


U.=——LLDew{ o Cowwl 


1 - ne a a’>a b’>b 


Tl. = pap D pos 4 XL Lawl - 


l1-I, . 


a’>a b’<b 
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See [35] for a detailed discussion of the approach in this case, but note that 
there are a few minor differences of notation between [35] and the present 
paper. In particular I, of [35] is here denoted by II,(1—T,), and Iz of [35] 
is here I1,(1—I,). Further, “a” refers here to columns and “b” to rows; in [35] 
the opposite usage holds. In [35], I. —Ilz of this paper is called +. 

From this point of view another motivation for ¢ in the continuous case may 
be given: tis just the value of 7 for the sample pattern of points, each taken with 
probability 1/n. For consider the sample of n (X;, Y;)’s as defining a discrete 
distribution in which X can take values Xi, ---, Xn, in which Y can take 
values Y;,--~-,Y;,, and for which p,.=1/n when (a, 6) =(X;, Y;) and zero 
otherwise. This discrete distribution may be represented as an n Xn table of 
probabilities in which all entries are zero except that exactly one entry in each 
row and column has pa =1/n. 

We readily compute that, in this case 


I, = 1/n 
1/(1 — I) = n/(n — 1) 
1 
i, = no. concordant unordered pairs ((X,, Y,), (X;, Y;)) 
n (15.5) 
2 


no. discordant unordered pairs ((X,, Y,), (Xj, Y;)) 


n 
2 


so that + for the uniform discrete distribution on the points of the sample 
{(X., Y,)} is precisely ¢ for the points considered qua sample. 


16. pgs IN THE GENERAL CASE, AND ANOTHER APPROACH 
TO fg IN THE NO-TIE CASE 


Let us now turn to analogous manipulations for ps and rs. For any bivariate 
distribution, we may consider the following disjoint chance events relating to 
three independent observations: 

C: at least one of the three is clearly concordant with the other two 


16.1 
D: at least one of the three is clearlv discordant with the other two. ( ) 


To say that (X;, Y;) and (X;, Y;) are clearly concordant (discordant) is to say 
that (X;—X;)(Y;— Y;) >0(<0). Emphasis is on the strong inequality. 

If the marginals are continuous, then w,.=Pr{C} and w,=Pr{D}. But if 
ties can occur, it seems natural to generalize the definitions of w, and wa, con- 
sistently with (15.2), thus, 

we = Pr{C| C or D} = Pr{C}/[Pr{c} + Pr{D}], 
wa = Pr{D| C or D} = Pr{D}/[Pr{c} + Pr{D}], 
to take account of the fact that patterns may occur that fall into neither of 
the events C or D. For example, the pattern of Fig. 847 is of this kind. On the 


(16.2) 
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other hand, some patterns with ties may fall into C or D (but not both). For 
example, that of Fig. 847a falls into C. 








y y 
) A 
@ 
@ 
|_»x L____» x 
Fig. 847. Pattern falling into neither C or D. Fig. 847a. Pattern falling into C. 


Dots are observations. Lower two observations tied in y coordinates. 


Adopting (16.2), it is natural to define ps as w.— wa generally. In the no-ties 
case this definition coincides with our earlier one. Reasonable estimators of 
w,, wa, and ps from a random sample of n( 23) would be, following section 15, 
to look at all ($) unordered triples from the sample and to set 


no. of the (*) triples in C 





n 
no. of the ( ") triples in C or D 
(16.3) 
n 
no. of the ( ’) triples in D 





n\? 
no. ot( 3 ) triples in C or D 
i 
estimator of ps 


n n 
no. of ( od triples in C — no. of ( ") triples in D 





no. of ( ") triples in C or D 


Note that, if ties can occur, it is possible that two observations will be tied in 
both coordinates. In such a case, if the third observation of a triple is tied in 
neither coordinate with the two, the triple must be in either C or D. 

Next, let us ask what the generalized ps will be for a cross classification. Nota- 
tion will be the same as that of section 15, and the computation will be carried 
out in two steps for simplicity. 

First, we recall a classical formula of elementary probability. If Ai, As, As 
are three chance events, then, 


Pr{ A, or A: or As} = Pr{ Ai} + Pr{ As} + Pr{ As} — Pr{Aiand A2} 


(16.4) 
— Pr{ A, and As} — Pr{ A; and As} + Pr{ A, and A; and As}. 
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Next, let the event A, be defined as follows: 
A: of three observations, (X;, Y,;), =1, 2, 3, observation 1 is concordant 
with observations 2 and 3. 


A; and A; are similarly defined, with observations 2 and 3 respectively playing 
the role of observation 1 in the definition of Ai. 

It is then clear that C = A, or A; or As, so that (using symmetry, and the fact 
that “A; and A,” means that all three observations are concordant) 


Pr{C} = 3Pr{Ai} — 3 Pr{all three conc.} + Pr {all three conc. } 

= ST cc “~ 2M cce 
where II. has its earlier meaning, Pr {A;}, and Il. is the probability that, of 
three independent observations, all are concordant. (It is convenient here to 


maintain unconditional senses for II., and [ece.) 
Next, let us introduce the notation 
Tas = , > + 2 Pa’b’ 


a’>a b’>db 


IIa p> > Pa’bd’ 


a’<b b’>b 


I= > Dd pow 


a’<a b/<b 
IVa= D D pew. 
a’>b b'<b 
The mnemonic for this notation is the conventional numbering of the quadranst 
around (X, Y)=(a, b). 
It is clear that 


(16.5) 


Tlee = _ > Par(Tas + ITT.»)? (16.7) 
a b 


by summing the quantities Pr { A; and (Xi, Y:) =(a, b)}. Similarly, by sum~ 
ming the six equal quantities of form Pr {all three concordant, with (Xi, Y:) 
in lower left position, (Xz, Y:) in middle, and (Xz, Ys) in upper right}, each 
of which in turn is taken as a sum of probabilities with the middle observation 
fixed, we obtain 


Tece = 6 >> >> paslesl IIe. (16.8) 
a 6 
Substituting (16.7) and (16.8) into (16.5) we have 
Pr(C) = 3 D& pas(Ias — Ia), (16.9) 
a b 


and, similarly, 


Pr(D) = 3 >> S&S pas(IIas — 1Vas)*. 
a b 


Thus, for a cross classification, we may define a measure of association, 
analogous to ps, as 


»» > pov! (Ios — IIa»)? — (Ilan — IV.3)?] ; 
DX DY pos [(los — Was)? + (ITas — 1Ves)?] 





(16.10) 
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This quantity might be useful in the analysis of cross ciassifications, and it is 
offered as another measure of association for such analyses, along with the 
measures of [35]. It is a direct analog of ps in the no-ties case for the wholly 
discrete case, and it has a clear operational interpretation, though not as direct 
as that of the corresponding analog to r. The computation of (16.10) is 
somewhat harder than that of the analog to r. (16.10) has the following 
properties: 
1. (16.10) is defined unless some pg, or p,=1. 
2. (16.10) lies between —1 and 1 inclusive. 
3. (16.10) is 1 if and only if Pr {D} =0. This means that there is no pair of 
positive cell probabilities, p., and pay, such thet (a, b) is discordant with 
(a’, b’). 

. (16.10) is —1 if and only if Pr {c } =0. This means that there is no pair 
of positive cell probabilities, p.5 and pa», such that (a, b) is concordant 
with (a’, b’). 

. (16.10) need not be zero in the case of independence. However, if at least 
one set of marginals is uniform (i.e., all p..=1/a or all p,»=1/8) then 
(16.10) is zero under independence, but not conversely in general. 

. In the 2X2 case, (16.10) is 


prpe2(pr + p22) — pi2pa(pr2 + p21) 





(16.11) 
Pupre(pu + p22) + pi2pa (p12 + pr) 


Property 5 is unhappy. It reflects the fact that, if the coordinates of two ob- 
servations in a class C pattern are cross-switched (X,<>+Y2, X.++Y;), the result- 
ing pattern may be in neither class C nor D. To surmount this difficulty, we 
might first change the cross classification so as to make one set of marginais 
equal, in the manner discussed by section 5.4 of [35]. Or, more basically, we 
raight redefine C as the set of all patterns with concordance between one of the 
observations and the other two and no ties at all. Although this complicates a 
bit the formulation of the analog to (16.10), I shall describe the reformulation 
shortly. 

First, however, let us show, analogously to the last part of section 15, that 
rs in the no-ties case may be motivated as the value of (16.10) for the sample, 
each point taken with probability 1/n. Consider, then, the sample of (X;, Y;)’s, 
with no marginal ties, as defining a discrete distribution in which X can take 
values X;,---, Xn, Y can take values Y;,---, Yn, and p»=0 unless 
(a, b) — (Xi, Y,), when Pab = 1/n. 

For this case we note that, for (a, b) such that p.,=1/n, 


Ia, + [Vas = (n — Ra)/n 
Ian + Ila = (n — Ry)/n 
II. + IIa. = (Re — 1)/n 
IIIa, + [Vas = (Ry — 1)/n 


where R, and R, are the ranks of the coordinates of that (X,, Y,;) that equals 
(a, 6). To see why (16.12) holds, one need, for example, only note that 


(16.12) 
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n(I.5+1V.s) is the number of observations with z coordinate greater than that 
of the (X,, Y,) equal to (a, b). 
From (16.12) we find that (when pa =1/n) 
Ge. R, R, -1 


Is — U1. = —_ = (n—- RR, —- Ro t+ 1)/n 
n n 





and that 


n—R n—R, 
n..-1V.= Sik = (R, — R;)/n. 
n n 





Substituting in (16.10), and carrying out routine algebra, we find that in this 
case the denominator of (16.10) is 
1 wF-!1 


3 n? 


and that the numerator of (16.10) is 


= Dra. 


whence (16.10), for the uniform distribution over the points {(X,, Y,)}, is 
just rs for that sample. 

Let us now reformulate the definitions of C and D, thus constructing a reason- 
able second analog to ps in the general case. Define, for three independent 
observations, the disjoint chance events 


(n + 1)? 1)? 


” a 


C*: at least one of the three is concordant with the other two, 


and there are no ties (16.13) 
D*: at least one of the three is discordant with the other two, 


and there are no ties. 
Based on (16.13), the natural generalization of ps would be, not 
[Pr{C} — Pr{D}]/[Pr{C} + Pr{D}] 
as in the earlier part of this section, but 
[Pr{c*} — Pr{D*}]/[Pr{c*} + Pr{D*}]. 


Both possible generalizations, of course, cvincide with ps in the no-ties case. 
There is no difficulty, in principle, about stating a reasonable estimator for the 
new generalization: one simply takes the formulas of (16.3) and replaces C by 
C*, and D by D*. 

Next, what will this second generalization of ps be for a cross classification? 
If we define, in our present spirit, I,.* as the probability that, of three inde- 
pendent observations, observations 2 and 3 are concordant with observation 1 
and there are no ties among the three, then, in the same way that (16.5) was 
derived, 


Pr{C*} = 31.c* — 2Mece. 
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Next, introduce notation analogous to that of (16.6), as follows: 


Lt = SOD CA 


a’,a’’>a_ b',b’’>b 
a’#a’’ b’s#b’’ 


tt= OD FE Ean 


e’,a’’<a_ b’,b’’>b 
a’xa’’ b’b’’ 


(16.14) 


with similar definitions for III,,** and IV,,**. Their interpretations are simple: 
I,,**, for example, is the probability that two independent observations from 
the cross classification are both in the first quadrant with respect to the (a, 6) 
cell and are untied in either coordinate. 
It is then clear that 
Tec* = D> > par(Ios*? + 20asI [Ios + I11a0*?) (16.15) 
a b 

and, by direct computation, that the measure of association for cross classi- 
fications, following this second route of generalization, is 


Dd D} pas( (Tas*?— 2l asl 1104+ 111 as*?) — (ITas*? — 211 asl Vase +1 Var*?) | 
; A 7. pab | (Ias*? — 21 as1 115 +111.2*?) + (IIae*? — 211431 Vas t+1Vas*?) | 


Note that (16.16) would be the same as (16.10) if the starred quantities were 
unstarred. The quantity (16.16) has the following properties: 





(16.16) 


1. A necessary and sufficient condition that (16.16) be meaningful is the 
following pair of statements: (a) there are more than two positive p,.’s and 
more than two positive p.,’s; (b) the entire mass of the cross classification 
is not concentrated on a single row and column. 

. (16.16) lies between —1 and 1 inclusive. 

. (16.16) is 1 if and only if Pr {D*} =0. This means that every triple of 
positive cell probabilities, a,s,, Pasbr, 20d pays, Without ties between the 
subscript coordinates, has one of the subscript pairs concordant with the 
other two. 

. (16.16) is —1 if and only if Pr {C*} =0. This means the same as above, 
but with “discordant” replacing “concordant.” 

. (16.16) is zero in the case of independence, but not in general conversely. 
That (16.16) is zero under independence follows from general considera- 
tions, or it may be demonstrated in detail by setting pas =pa.p.. 


For other approaches to ps in the cross classification case, see “Student” 
[80], Hoeffding [41], and Kendall [51]. 

Finally, we may ask what (16.16) becomes when we begin with a no-ties 
n-fold sample {(X;, Y,)} and compute (16.16) for the corresponding nXn 
cross classification with p.5=0 unless (a, b) =(X;, Y,), when pas=1/n. We sup- 
pose that n23. Notice here that, for an (a, b) with pa=1/n (and we need 
eonsider no others), 


) ss a I,2? a I,,/n, 


for two observations in the first quadrant with respect to (a, b) can tie mar- 
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ginally if and only if they fall in the same cell. Similar relationships hold for 
the other three quadrants. Hence for our particular case, (16.16) is 


a pes{ (Las —I11as)*~ (ay —1Ves)*] -— DX ZX pes[Ies +11 1a —11es —1Ve5 ] 





(16.17) 
> Zz pes{ (Las —IIas)*+ (Hay —1Ves)*] -— zx > pad [Ios +11 Ios +11os+1Vas | 


Now the first summations in numerator and denominator may be expressed in 
terms of ranks, just as we did for (16.10). The second summations are even 
simpler, since 


2/n n—1 
yp» D© pas(Ier + III.) = —(*)e. = Fs 


n 
n—-1 


2 
D Dd pas(Ilos + IVav) -—(°) P= P, 


n 


where P, and P, refer to the {(Xi, Y,)} sample. Substituting, we find that 
(16.17) is 
(n + 1)? ~s, n(n? — 1) n-1 


~ rs t 
n n? 3n3 n? 








4 
ar > Rx Ry; - 
nr 





1 n?—1 n-1 1 ( 1) 9 
— — — —_ nr — 
3 n? n? 3n? 2 ( 





(16.18) 
n+1 3 
= r.— t. (compare (13.10)) 
n-—2 n—2 
In short, (16.16) for a no-ties sample, considered as a special cross classification, 
is just (13.10), our more natural estimator of ps, for the sample itself. 

Since this second generalization of ps is more reasonable than the first in its 
behavior under independence, we have an additional small argument in favor 
of (13.10) rather than rs as an estimator for ps, the population quantity to 
which both converge stochastically. 


17. HISTORICAL COMMENTS 


All the ordinal measures of association discussed above had their beginnings, 
to the best of my knowledge, in the last years of the nineteenth century, and 
in the early years of the present century. Forms of these ordinal measures 
entered the statistical literature only a few years after Francis Galton, Frank 
Edgeworth,* and Karl Pearson had fashioned the correlation coefficient as a 
tool of statistical analysis. Of course, formal probabilistic discussion of quanti- 
ties closely related to the correlation coefficient had existed for some time; for 
discussions of this early work see H. Walker [88, Chap. 5], and K. Pearson 
[68] and [69]. Pearson, in his biography of Galton, says [67, Vol. II, p. 392, 

* Edgeworth's contributions to the study and use of the correlation coefficient have not been as widely recog- 
nized as their merits may deserve. A discussion and summary of Edgeworth’s work on correlation is given by Bowley 


(11, Chapter 9]. Two of Edgeworth’s major articles on the subject are [26] and [27]. Bowley gives an extensive bibli- 
ography. K. Pearson [68] was critical of Edgeworth'’s work on correlation. 
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and Vol. IIIA, p. 3] that Galton had first tried correlation of ranks or grades 
but had forsaken that approach in favor of what later became standard bi- 
variate normal correlation theory. This shift is quite understandable when we 
notice that Galton thought primarily in terms of linear regression. Nonetheless, 
Galton’s own measure of correlation was itself a function of marginal inter- 
quartile ranges and the slopes of the lines of conditional medians. See [67, 
Vol. IIIA, pp. 50-57], for details. 

Most of the first papers on ordinal measures of association were wholly in 
terms of the sample values; and indeed this emphasis still continues. It is only 
in relatively recent years that much attention has been given to the population 
meanings of the measures. In fact, one motivation for this paper is to stress 
the importance of population meanings. In contrast to this, the population 
meaning of the correlation coefficient has been emphasized from the time of 
Galton. Another feature of much work on sample measures of association has 
been its emphasis o: the use of such sample statistics for estimating the cor- 
relation coefficient, under the assumption of normality. This is closely related 
to the lack of discussion of population meaning; for we cannot estimate an 
undefined quantity. 

A quadrant measure of association was first proposed, so far as I am aware, 
by Sheppard in 1899 [77] who did consider its population meaning. However, 
Fechner, in 1897 had proposed a similar quantity but in a more complex con- 
text, that of double time series. See [29, pp. 386-98]; also discussion by 
O. Anderson [4, pp. 249-50]. Since then, quadrant measures have been dis- 
cussed from time to time, for example by Thorndike in the psychological 
literature [81], [82, p. 155], [83], and by Cochran [lla]. An extensive dis- 
cussion of sampling theory for g, the sample analog of the quadrant measure, 
was given by Blomqvist [10]. 

It is undoubtedly possible to devise a variety of measures of association re- 
lated to 9; for example, one might modify 9 so as to stress the “outer” parts of 
the quadrants along lines suggested, in the hypothesis testing context, by Olm- 
stead and Tukey in their corner test [66]. 

The essential idea behind the measure of association r was first suggested, I 
believe, by Fechner in 1897 [29, particularly pp. 372-5], although Fechner 
was mainly concerned, not with association in a bivariate population, but with 
association between two time-series. (For details on Fechner’s work and its 
relation to r, see Risser-Traynard [72, p. 109 (Ed. 1), p. 6 (Ed. 2, Liv. II) ], and 
Salvemini [73].) Fechner’s suggestion for the double time-series case appears 
to have become known in France; see March [63] and Lenoir [55, p. 69], for 
example. Next, (2)P., a simple linear function of t, was proposed by G. F. 
Lipps in 1905 [60], and (3)¢ was discussed by him in 1906 [61]. Lipps obtained 
the first two moments of these quantities under the hypothesis of independence, 
and suggested their use as a test of independence. 

Meanwhile, in France, Binet and his colleagues had proposed, [8] and [9], 
measuring association by a function of the ranks, essentially the same function 
as that later called Spearman’s foot-rule. These French psychologists appear 
shortly to have stopped using such measures, see [78, p. 86], although continued 
interest is suggested by Binet’s footnote to [76, p. 492]. A few years later, in 
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1904, the English psychologist, Spearman, became interested in sample 
measures of association based on ranks, [78] and [79]. Spearman suggested 
two measures. The first was rs, motivated by him as the sample correlation 
coefficient between the ranks. The second was an analogous quantity, but based 
on >_| Rx,—Ry,|. This is the so-called Spearman foot-rule; I have not discussed 
it becr-1se it does not seem to estimate a population quantity having a simple 
probabilistic interpretation. 

I return now to 7 and ¢t. The German educational psychologist, G. Deuchler, 
considered ¢ at length in a series of papers beginning in 1909: [20], [21], 
[22], [23], and [24]. Deuchler began with Lipps’ suggestions and carried the 
discussion much further. His major paper appears to be that of 1914 [21]. 
In it, Deuchler considered the exact distribution of ¢ under the hypothesis of 
independence and obtained essentially the same recursion formula as that later 
worked out by Kendall. By means of this, the exact distribution of ¢ under 
independence was developed numerically for n = 2, 3, 4, 5. Deuchler recognized 
the now familiar relation between ¢ and numbers of inversions, and he developed 
the generating function for the distribution of number of inversions under the 
hypothesis of independence. The question of ties was dealt with at length, in 
a way related both to that later used by Kendall and to that discussed in sec- 
tion 15. The heavily tied case was considered here and also in [24]; for more 
details see the comments in [36]. A method for simplifying the computation 
of ¢ was worked out. Lipps’ expressions for mean and variance under inde- 
pendence were rederived by the generating function method. Then Deuchler 
turned to non-independence, and gave, as the mean and variance of t, 7 and 


2(1 — r*)[2n + 5 + (2n — 1)r]/[3n(n — 1)(r + 3)], 


respectively. 

The above variance expression is different from our (11.3) and it appears to 
be in error. While no derivation for it was given in [21], the reader was referred 
for one to a monograph by Deuchler in which the material of his journal 
articles was expounded in greater detail. This monograph has never been pub- 
lished, but, through the courtesy of Professor Deuchler’s widow, I have been 
permitted to examine it and to retain a microfilm copy of it. In the section on 
non-independence, Deuchler restricted himself to what he calls “regular de- 
pendence,” a very special kind of dependence, and, under this restriction, 
II.. is related to II, in such a way that Deuchler’s variance expression and 
(11.3) are the same. 

The meaning of “regular dependence” seems to be the following: Let Y,, 
Y.,--- be the Y,’s corresponding to the ordered X;’s in an n-fold random 
sample. 7, is the Y observation associated with Min X;,, ete. Now let mi, 
Po, ***, be a sequence of nonnegative numbers such that Pr { 7, is lowest 
Y observation} =p,/ >-?., p;. Assume p,>0. Conditionally on Y; being the 
lowest Y, assume that the probability that 7 be lowest among the remaining 
n—1 Y's is py/ Dt ps (if §’<j) or pya/ LY" pm (if j’>J). And so on 
seriatim. In general, Pr {7j,<¥j,< --- <¥,,_,<Vj,<other P’s| 7;,<7;, 
< +++ <Y¥,_,<other P’s} is p,/ >-?-**! p;, where s=j, minus the number 
of inequalities, 7:<jx,j2<je, - * * ,je-1<je, that hold for the particular sequence 
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of 7’s of interest. The heuristic idea is to work conditionally at the kth step on 
the remaining (X;, Y;)’s and to suppose that pi, po, - * * , Pn—e41 give the rela- 
tive probabilities that the Y’s of the remaining observations be least. This is 
the “regularity.” 

One can compute the p’s in term of r from the above structure. For example 
if m=l, then pr=(1—7)/(l+7), ps=(1—7)/(l+7)*, pao=(1—1)(8+7) 
/(3(1+7)*]; in general Deuchler gives an expression equivalent to 


II (G+) + G- 0 


@—1)'1l+7r) 


Now, for this “regular dependence” structure, Deuchler’s variance is correct 
and coincides with (11.3). “Slowever the structure itself may be criticized, not 
only for its great restrictiveness, but also because, if one adds natural symmetri- 
cal assumptions for greatest (instead of least), then + must be zero. The idea 
of regular dependence, nevertheless, seems interesting, and one reason for 
describing it here is that some modification of it might be a useful restriction 
on the bivariate distributions of interest. The search for such useful restrictions 
- has recently been exemplified by Konijn [53]. Another article in the same direc- 
tion is that of Barton and David [5]. 

Deuchler also considered in [20] the approximation of the distribution of t 
in general. A critical discussion of the two measures proposed by Spearman 
was presented, and finally a detailed discussion of 2X2 tables appeared. The 
2X2 table question was also extensively discussed in [22]. In [23] Deuchler 
applied various measures of association, including ¢, to the study of arith- 
metical skills in children. A biographical note about Deuchler is given in [54]. 

Although the suggestions of Lipps and Deuchler were discussed in the 
German psychometric literature of the time (see, for example, Betz [7], Wirth 
[91], and Valentiner [85] and [86]), these suggestions do not seem to have 
aroused much interest in other countries or in other fields. For example, I have 
found no references to them in the standard German statistical treatises, such 
as those of E. Czuber. However, in Switzerland, especially in medical and psy- 
chiatric work, Lipps’ original work does seem to have had some influence. I have 
run across a constellation of articles by Swiss authors, including Bersot [6], 
Haemig [39], de Montet [18] and [19], Piguet [70] and Michaud [64], in 
which use of Lipps’ suggestion was made. 

In 1924, Esscher [28] independently suggested r and ¢ as measures of asso- 
ciation, and gave a clear statement of the population meaning of r. Esscher 
restricted his considerations to the bivariate normal case, but this is inessential 
for his basic statements. In the bivariate normal case, Esscher obtained the 
variance of sin (#/2)¢ and an inequality for this variance. 

Lindeberg, a little later, [56] and [57], considered r and t without the nor- 
mality restriction, gave a clear statement of the meaning of 7, and presented 
—so far as I know for the first time—the variance of ¢ in general. 

In 1328 or before, S. D. Holmes proposed ¢ as an approximation to the ordi- 
nary correlation coefficient. Holmes’ approach has been described by P. Sandi- 





Pi = 
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ford [75, Appendix B], and recently by H. D. Griffin [37]. The approach is 
graphical in nature and represents a neat way of computing ¢ that seems to 
be generally unknown. 

The quantity II, was independently proposed by de Finetti in 1937 [17] as 
a measure of association. De Finetti motivated and interpreted II, in essentially 
the same way as does section 4 of this paper. 

In 1938, Kendall [50] independently proposed t, and began a series of papers 
dealing extensively with ordinal measures of correlation. A thorough discussion 
is given in Kendall’s monograph [51]. Kendall, and other English workers in 
this area, have written extensively on one topic not treated here at all, the 
distribution of and rs when sampling without replacement from a finite popu- 
lation. 

Hoeffding, in a fundamental paper on asymptotic distribution theory [44], 
states in passing the population interpretations of both r and ps. Kendall [51, 
2nd ed., chap. 9] briefly discusses the population interpretations of r and ps, 
apparently basing his discussion on the doctoral thesis of R. M. Sundrum. 

Other measures of association have been suggested from time to time. For 
example, Lipmann [59] proposed various measures based on the order statistics 
of | Rx,—Ry,| ; but see critical comments by Wirth [92]. Von Schelling [87] 
proposed a curious kind of measure that does not seem to have aroused interest 
elsewhere. About forty years ago Gini [34] proposed a symmetrized version of 
Spearman’s foot-rule that has been frequently discussed in the Italian litera- 
ture since then; for example see Amato [3] and Salvemini [73] and [74]. A 
crude form of this quantity had been discussed by Ténnies in 1909; see [36] 
for details. Related discussions have been given by Watkins [89] and Julin 
|47]. There is an extensive Italian literature, centered about the work of Gini, 
on the whole subject of measures of association, and there have been a number 
of interesting contributions by Fréchet; for more detail on these and other con- 
tributions, and for bibliography, I refer to [36]. 

Measures of association based upon the Shannon-Wiener information con- 
cept have been proposed in recent years. For a discussion of these I refer to a 
paper by E. H. Linfoot [58]. Fieller, Hartley, and Pearson [30a] discuss the 
sample ordinal measure of association obtained by computing the correlation 
coefficient, not of the observations, but of the corresponding mean normal order 
statistics. This is an application of a proposal made earlier by Fisher and Yates. 

Measures of partial and multiple association that are closely related to the 
bivariate ordinal measures of association discussed here have been proposed 
from time to time. Discussions of this topic will be found in Kendall’s mono- 
graph [51] and in an article by Goodman and me [35]. 

All of the measures of association discussed in detail in this paper have the 
property that they may be zero even when X and YF are stochastically de- 
pendent. This has led some authors to seek measures of association that are 
zero if and only if X and Y are independent. In particular, the works of Fréchet, 
Gini, Steffensen, Cramér, and Pollaczek-Geiringer cited in [36] bear on this 
point. Hoeffding’s [40, 41, and 42] should be mentioned here, as well as a later 
paper [45]. See also Féron [30]. 

Daniels [13 ] suggested a formal synthesis for certain ordinal sample measures 
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of association by considering them as sample correlation coefficients from an 
n?-fold sample constructed from the original n-fold sample, (X;, Y;),i=1,---, 
n, via functions a and b: (a(X;, X;), b (X;, X;)); 7, 7=1, - - - , n. It is required 
that a(X;, X;) =b(X;, X;) =0 and that a(X;, X;) = —a(X;, X;) and b(X,, X,;) 
= —b(X;, X,). Konijn [53, pp. 306-7], has given, in concise tertus, a similar 
synthesis. A related synthesis, in terms of general measures of dispersion, has 
been suggested by Amato [2] and [3]. 

Kendall’s monograph [51], contains an extensive discussion of recent 
developments regarding ordinal measures of association, and the reader may be 
referred to this book for a discussion of recent work not mentioned here. 

There has been an enormous amount of statistical work with the goal of 
making precise and useful our intuitive notions of stochastic association. I have 
tried to survey with reasonable completeness that portion of this work dealing 
with ordinally invariant measures of association, but the literature is so exten- 
sive and scattered that this survey is almost surely not completely com- 
prehensive. 
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CURTAILED SAMPLING FOR VARIABLES 


Norman R. GARNER 
Aerojet-General Corporation 


If the criterion for lot acceptance after testing n sample items is ¥+ks 
<L a sufficient condition for rejection after r tests is 





¥, + V/V,BieC — A/C > L 
where Y¥, and V, are the average and variance of the r observations, 
A=(n—r)/n, B=(r—1)(n—1) and C=r(n—r)/n(n—1). If the accept- 
ance criterion is ¥+ks>L then rejection is certain if and only if ¥, 
minus the square root is less than L. 


HE usefulness of curtailed sampling is evident when testing is costly, when 
‘Wout test takes a long time, or when failure of the test characteristic 
indicates a potential hazardous condition. For example, excessive chamber 
pressure of a rocket might cause a blow-up which, in turn, might extensively 
damage a test stand. Thus, if it can be shown that a lot will be rejected with 
certainty after r of the required n tests have been completed, monetary savings 
are assured by eliminating unnecessary tests and having a decision in the 
shortest time possible. 

Single sampling plans, however, normally permit rejection of a lot only after 
a specified number of observations have been made. Nevertheless, the possi- 
bility for curtailment has long been recognized for both attributes and variables 
[2] [3] [6]. In fact, precise procedures can be established for attributes. For 
illustration, suppose n = 100, and c=5 for a particular plan being used and that 
the fifth defect was observed on the 66th item inspected. This particular lot can 
be rejected without inspecting further items. 

This paper presents a solution for curtailment of variables sampling for 
single sampling plans such as those listed in reference [5] and discussed in 
reference [1].! 

Briefly, suppose that the specification for lot acceptance places a limitation 
on the characteristic being tested, so that when the results of the sample have 
been obtained, the relationship, 


X+ks<l, 


must hold, where X is the arithmetic mean of the n observed values of the 
characteristic being measured, s is the standard deviation, and k is a positive 
constant assigned by the specification. 

After a part of the specified sample has been tested, can it be shown that the 
lot will be rejected regardless of the outcome of the remaining tests? If so, the 
lot can be rejected immediately. 

Wallis [6, pp. 14-15] discusses the simplest example of curtailment. Thus, if 
the sum of fewer than n observations exceeds nL,, making it clear that for the 





1 The need for curtailment for variables was suggested to the author by Commander William J. Corcoran, 
Special Projects, B of Ord , U. 8. Navy, with special reference to the example which is given. 
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entire n items X will exceed L, and so X +ks will exceed it still further. This pro- 
cedure requires that negative observations be impossible, or at least a minimum 
possible value of the characteristic being inspected. Wallis also suggested the 
use of more complicated procedures, such as the one presented here. We ask, 
“What is the minimum value of L that can be obtained, whatever the values of 
the X’s for the remaining test?” If this value of L exceeds L,, then the lot can 
be rejected without further testing. 

It is evident that we are discussing only a sufficient cause for rejection. That 
is, if there does not exist a set of X’s which would allow acceptance, we can 
be sure that the lot will be rejected; however, if such a set of X’s exist, there is 
no guarantee that the lot will be accepted. Of course, the probability of obtain- 
ing such a set of X’s could be considered, and if the probability were small, 
reject the lot; contrarily, if the probability were high, accept the lot. This 
situation will not be discussed in this note. 


DETERMINATION OF THE MINIMUM L, 


Our problem, then, is to find the most advantageous set of the n —r remaining 
X’s after r tests have been completed. If this set of X’s gives an L value greater 
than L,, the lot can be rejected immediately since it is certain that the lot will 
not be accepted, whatever the values of the X’s for the remaining tests. 

In order to find this set, let 


= X¥,+ ks, (for n observations) 
with 
—re 


Sof ] 
-—Yu+— Y x= ¥ 4+" . » a 


TN keel NN jmr+l 


> (X; — X,)? 


j=l 


1 
. 2% X,)? ihe > (X; — Xn-+)? 
. ras t=r+l 
was On ¥, - Be.) 
(n— 1), 


We then set 
d dha) _ 


. i=r+i,r+2,--- 
dX; 


where 
d(Ln) _ ax, ‘ dS, oaV, 
aX; Ox; dV, ox; 
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“hat is, we are finding a set of X’s which gives a minimum L,. This will be the 
most advantageous set. For a given i, 
ae 


ox; n ‘ 





ov, 2 
ox; EY cl 
so that we have n—r equations in n—r unknowns such that 
dL. k(X; — X,) 
dX; (@—1VV, 
However, this is true for any X;; it therefore follows that for all 7 


Lb — 1)VV, 
ee 
nk 





1 
“a 


Xai = Xug=--+ =X, =X. 


Thus, throughout the remainder of the report, X without a subscript refers to 
any one of the n—r remaining observations. 

This result can be obtained also by the argument that for any set of X’s, all 
not equal, there is a set of X’s which gives a smaller L. Examination of Equa- 
tion (1) reveals that we can find a set of X’s which would not alter the mean, 
X¥,-,, but which would minimize the variance if all the n—r remaining X’s were 
equal; that is, the second term could be eliminated. This would give the smallest 
value of the standard error possible for a given mean value, thus giving a 
smaller L value than that set of unequal X’s. 

We now want to find X in terms of the first r observations. Summing over 
the n—r equations, we have 


» dL, n-—rT k(n — r)(X — Xs) _ 


} - 


imr¢1 OX, yerer (n — 1)VV, 





which reduces to 
1 k(X — X,) 
n (n—1)V7V, 


Rearranging and squaring, we have 


(n — 1) > (X; — X,)? = k*n(X — ¥X,)?. 


j=l 
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But remembering that all X,;(¢=r+1, r+2, - - - ,n) are equal so that X = X,_,, 
we have 


ix-Xy= Lo- By 
n 


¥ (x), - ®t = HK — B24 


j=l k=l n 


(X — X,)*. 


Substituting these results in Equation (2), we find that L, is a minimum when 


(n—1)(r—1)nV, \'? 
nk*r? — r(n — r)(n — = : 





x= X,-( 


(3) 


where V, is the variance of the first r observations. 
Note that a solution can not be obtained unless the denominator of the 
quantity enclosed in parentheses is greater than zero. This will be true when 


n(n — 1) 
> . 
nk? + (n — 1) 


This result has been restated and simplified by William Kruskal [4] so that 
rejection is certain if and only if 


X, + VV,B(kC — A?®)/C is real and > Ly (4) 








where 
A =(n-—r)/n 
B = (r — 1)/(n — 1) 
and 
C =r(n — r)/n(n — 1). 
If the acceptance criterion were 
X —ks > Li, 


it would be necessary to find that set of X’s which maximizes L;. This would be 
analogous to the present problem, and the optimum value for X would be 


(n — 1)(r — 1)nV, )" 


nk*r? — r(n — r)(n — 1) 





x= ¥,+( 


Using Kruskal’s simplification, rejection is certain if and only if 


X, — VV,B(kC — A*)/C is real and < L). 





AN APPLICATION 


The foliowing example actually arose during acceptance testing. The data, 
however, are coded. A sample of 12 rockets are to be tested for maximum 
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pressure. Specification requires that 
X + 4.15s < 200 psi. 


Two rocket motors have been tested and the observed results for maximum 
pressure are X,= 193 psi and X,=84 psi. Thus we have 


n= 12 
r=2 
k? = 17.2225 
X, = 138.5 psi 
V, = 5940.5 (psi)?. 


Substituting these values in Equation (3), the most advantageous value of X 
is approximately 102.5 psi. Setting all of the remaining ten X’s equal to this 
value, we find that 


X¥, = 108.5 psi 
8, = 27.13 psi 
and thus the minimum possible L, is 
L, = Xn + ks, = 221.1 psi. 


This same result is obtained using Kruskal’s simplified procedure (4). Thus, 
it is not necessary to compute X explicitly. 

Since this value of L, the minimum possible, is larger than the allowable 
200 psi, it will be impossible to accept this lot, regardless of the outcome of 
the remaining ten tests. There is no need to continue testing and the lot is 
rejected immediately. 

Of course, the same results could be obtained by setting X equal to various 
values and finding minimum L over the chosen range of X values. For example, 
several L values have been tabulated (in Table 866) for various X’s in the given 
example. It can be seen that the minimum value <f L is around 221 psi. which 
is, of course, larger than the allowed value; thus, we reject the lot immediately. 


TABLE 866 
L VALUES FOR VARIOUS X’S (psi) 








L 





224. 
221. 
221. 
221. 
224. 
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SUPERIORITY OF SEQUENTIAL SAMPLING 


In the situations for which curtailed sampling might be considered, it seems 
as though some sort of sequential sampling would be superior. If it is at all 
possible, it is suggested that a sequential plan such as indicated by Wallis [6, 
pp. 83-5] be adopted. If it is contractually or administratively impossible, then 
the procedure for curtailed sampling can be used. 
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ON THE STUDENTIZED SMALLEST CHI-SQUARE* 


K. V. RAMACHANDRAN 
Demographic Training & R ch Centre, Bombay 





Defining the studentized smallest chi-square v with parameters ¢, m, 
k, as the ratio of m times the smallest of k independent chi-square 
variates with ¢ (degrees of freedom), each to ¢ times another independent 
chi-square variate with m d.f., the lower 5% points of v are tabulated 
for different values of ¢, m, and k. Some possible applications of v 
are also indicated. 


1, INTRODUCTION AND SUMMARY 


HIs paper deals with the distribution of the studentized smallest chi-square, 
denoted by v, with parameters t, m, k. Lower 5% points of v are given for 
different values of the parameters. 


2. THE PROBLEM 
Suppose we have KF statistics 


7 Man ge (1) 
—s 
where S;, S:, - - -, S, and S are mutually independent, S;/c*? having a x? dis- 
tribution under the null hypothesis with t; degrees of freedom and S/o* having 
a x’ distribution with m d.f. In the case when t;=t(t=1, 2, - - - , k) we are in- 
terested in certain situations (mentioned in Section 3) in evaluating a value of 
V such that 


Prob [F; > V;i=1,2,---,k] =1-—a, (2) 


where a is a given positive constant such that 0<a<1. It is evident that (2) 
can be rewritten as 


S; m Smin m 
prob | > v;¢=1,2,---,k] = Prob| *>v|- 
S ¢t aus 


In order to obtain a V such that (8) is satisfied, we need the distribution of 
the studentized smallest x?. The distribution can be obtained in a manner simi- 
lar to that given in [4]. Using methods given in [4], lower 5% points of 


“(Fil 


are given for different values of t, m and k. 
Finney [1] gives an expression for 


Smin 
Prob [ > a| 
S 


* Presented under a different title at a meeting of the Institute of Mathematical Statistics, Seattle, Washington. 
August, 1956. 
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in a complicated form. He does not give any numerical tables in his paper. 
Nair [3] gives a method of computing the percentage points of the student- 

ized smallest chi-square when t= 1. He gives a method by which the lower per- 

centage points of v can be calculated when t=1 by the use of an auxiliary table. 


8. APPLICATIONS OF THE STUDENTIZED SMALLEST CHI-SQUARE 


The studentized smallest chi-square is encountered in a wide variety of 
problems. There is presented here an account in which the main applications 
are mentioned briefly, according to fields of interest; this is not intended to be 
a complete catalogue of all possible uses. 

a) In field experiments, the smallest of k variances tested may become signifi- 
cantly small as compared to the error variance if the (¢+ 1) groups of plot values 
from which the ¢ d.f. of that variance were obtained showed a high negative 
intra-class correlation. Wishart [6] gives an example where a test of significance 
of the studentized smallest chi-square when t=1 is appropriate. 

b) A rather interesting use for the test of significance of the studentized 
smallest chi-square, when t=1, can be found in certain methods of statistical 
control in sample surveys devised by Mahalanobis [2]. If, say, we are sampling 
agricultural fields in k districts and send two investigators to each district to col- 
lect half the quota of sample units into two randomly selected half samples A 
and B, a comparison of the A and B mean values for each district should give a 
clue as te whether the data have been properly collected. If one of the A—B 
differences is very large and the variance ratio significant, it is usual to conclude 
that the two investigators concerned did not carry out the’ instructions n the 
same manner or that some other personal bias had crept in. On the other hand, 
if one of the k differences between (A, B) pairs is surprisingly small and turns 
out to give a significant smallest variance ratio, it may perhaps arouse suspicion 
that the two investigators consulted each other and dishonestly made the means 
of their data agree. This is the negative side of the argument in favor of a 
test of significance involving the studentized smallest chi-square. On the posi- 
tive side, it may often turn out that the smallest variance ratio is not signifi- 
cantly small, avoiding awkward aspersions on the reliability of the investigators. 

c) Sometimes in quality control work we are interested in comparing k pro- 
duction processes or machines with a standard or control process or machine. 
If the criterion of comparison is the variances of the yields or output then a 
procedure for this is given by Ramachandran and Khatri [5]. This procedure 
utilizes the lower percentage points of the studentized smallest chi-square. 

It can easily be seen that the procedure can be applied in similar situations 
in other fields like agriculture, biology, etc. 


4. CONCLUDING REMARKS 


Using similar methods, percentage points of the studentized largest chi- 
square for different values of t, m, and k are computed and will be offered for 
publication soon. 
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TABLE 870 


LOWER 5% POINTS OF » WHEN t=1, FOR DIFFERENT VALUES 
OF m AND k 














5 or More 004 .001 .0005 .0003 .0002 .0001 .00008  .00006 





TABLE 870a 


LOWER 5% POINTS OF v WHEN t=2, FOR DIFFERENT VALUES OF 
m AND k 








5 to 10 -052 -026 -017 .013 -010 -008 
12 or More -051 -025 -017 -013 -010 -008 








TABLE 870b 


LOWER 5% POINTS OF » WHEN it=3, FOR DIFFERENT VALUES 
OF m AND k 

















TABLE 870c 


LOWER 5% POINTS OF v WHEN t=4, FOR DIFFERENT VALUES 
OF m AND k 
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TABLE 871 


LOWER 5% POINTS OF » WHEN t=6, FOR DIFFERENT VALUES 
OF m AND k 











.159 








TABLE 87la 


LOWER 5% POINTS OF »v WHEN m=, FOR DIFFERENT VALUES 
OF t AND k 











ONoarh WON 








TABLE 871b 


WHEN k=3, FOR DIFFERENT VALUES 
t AND m 
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TABLE 872 


LOWER 5% POINTS OF »v WHEN k=2, FOR DIFFERENT VALUES OF 
t AND m 











a, BOG. ee pean 
068 .111 .176 .219 
069 .113 .182 229 
069 .114 .185 235 
070 .115 .188 8.241 
070 .118 .196 .252 
070 «=.119 = 6199S 256 
072 .122 .207 38.273 








TABLE 872a 


LOWER 5% POINTS OF »v WHEN k=3, FOR DIFFERENT VALUES 
OF t AND m 














-096 
-098 
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THE ESTIMATION OF THE PARAMETERS OF A LINEAR 
REGRESSION SYSTEM OBEYING TWO 
SEPARATE REGIMES* 


Ricwarp E. QvuanpT 
Princeton University 


In attempting to estimate the parameters of a linear regression system 
obeying two separate regimes, it is necessary first to estimate the 
position of the point in time at which the switch from one regime to 
the other oceurred. The suggested maximum likelihood estimating 
procedure is based upon a direct examination of the likelihood function. 
An asymptotic and a small-saraple test are suggested for testing the 
hypothesis that no switch occurred against ihe single alternative that 
one switch took place. The procedure is illustrated with a sampling 
experiment in which the true switching point is correctly estimated. 


1. INTRODUCTION 

CONOMIC variables may sometimes be connected by linear relations with 

the property that the parameters of the relationship are subject to dis- 
continuous changes. As an example consider the consumption function 
C=aY+b. Aggregate consumption depends upon the level of aggregate (dis- 
posable) income. In addition, it may be hypothesized that consumption 
depends non-linearly on other factors such as the state of expectations con- 
cerning the future of the economy, the volume of installment buying, the level 
of the interest rate, etc. These other variables may have the effect of altering 
the parameters of the consumption function in the following fashion: when the 
critical outside variable (say, the rate of interest) 7 satisfies 


t<i* 
then 
C=aYrh 


and when 


+> 


where 7* is the critical level of the outside variable in question. 

In general, one may not be able to identify the critical outside variable and, 
as a result of this, one may not be able to state at what time the system 
C=aY+b changes from one regime to the other. The objective of this paper 
is to indicate a possible estimating procedure for the switching point under 
conditions when it is known that the time period under consideration contains 
a single switch. It is assumed that (1) times series data free of errors of obser- 
vation are available for the two variables, (2) that the error terms are independ- 

* I am indebted to Professors F. Anscombe, R. Dorfman and F. Stephan and the referees of this paper for 
criticism and helpful suggestions. I am also indebted to Professor John S. Chipman for originally suggesting the 
problem. The responsibility for errors is, of course, mine. 


1 An analogous but more cumbersome procedure can be devised when it is known that the period in question 
contains two, three, - + + , » switehes. However many switches, their number is assumed to be known. 
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ent of each other, and (3) that the error terms are independent of the explana- 
tory variable.? 
2, THE METHOD OF ESTIMATION 


We postulate the existence of two relationships 


y=arth+u (1) 

y = aor + be + We (2) 
where u; and wu, are normally and independently distributed error terms with 
mean zero and standard deviations a, and o2. These two true relations generate 
a total of 7 observations and the problem is to estimate the point at which 
the system switches from one regime to the other.* Assume that the first ¢ 
observations are generated by (1) and the last 7’—t by (2). In order to estimate 
t and the remaining parameters of (1) and (2) we proceed as follows. The densi- 
ties of uw at point 7 and uw at point j are 


(1/-/2me1) exp [—(1/2012)(ys — az; — b)?] 
and 
(1/\/2mo2) exp [—(1/202*)(yy — aary — b2)?}. 


The likelihoods of a sample of ¢ observations from (1) and 7’—¢ observations 
from (2) are therefore 


(~— ) exp (- . y (yi — aa; — bi) 


V24¢ 2o;? jor 


1 T-t 1 T 
oui —_ de 2 
( <=) exp ( 20, ~ (ys — Gen; by) 


and the likelihood of the entire sample is 


(en) (ae) 
/2ro, J/2ro. 


1 $ 1 
exp (- DX (yi — ux; — bs)? - D (ys — a; — bs). 


20; i=l 20:7 j=t+1 





T 


Taking the logarithm of the likelihood function 
= — T log V2x — tloga: — (T — t) logaa 


t T 
— (1/2c,’) > (ys — airy — bi)® — (1/203) DY (ys — ane; — be)*. (3) 


j= t+l 





* A somewhat similar problem has been diseussed by E. S. Page, ‘‘A Test for a Change in a Parameter Occurring 
at an Unknown Point,” Biometrika, 42 (1955), 523-7 and “On Problems in Which a Change in a Parameter Occurs 
at an Unknown Point,” Biometrika, 44 (1957), 248-52. The author considers a sample {z,} and develops a test of 
the hypothesis that the sample was generated by a distribution function F(z 6) against the alternative that z1,°-°, 
Zm were generated by F(z] @) and tm41** +, 2, by F(z| 6"), 6 40. 

* Under favorable circumstances the switching point may be located by inspection of a scatter diagram. On 
the other hand, it is easy to think of cases in which inspection will not provide a useful estimate. 
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Setting the partial derivatives of (3) with respect to a;, b:, a, and b, equal to 
zero and solving the resulting equations we obtain the usual least squares 
estimates 
t t t t t 
tLew- Da Du Le Bw 
t=1 t=1 t=I t=1 t=] 
a4 = , r : ; b, ~e4 i — 4; i 
t D a? ao ( >Y x) 


i=l t=1 





T 


T T 7 T 
(T-) Dew- Da Du DY vi Dy x 
t= t+1 t=t+1 t=t+1 tm t+1 t= t+1 


i D Ta be ey or 2 ao 
(T-) > ot —( > 2) 


t=t+1 t= t+1 








Setting the partial derivatives of (3) with respect to o; and o2 equal to zero and 
substituting for a;, b:, de, be their estimates, we obtain 


y (ys — dix; — 5)? 


t 


6? = 





sg 
DL (yi — ders — bz)? 
é? j=t+l 





T-t 
Substituting back in (3) these maximum likelihood estimates we obtain 


aie T 
i) = = The Vee ~ the — (fF ~ aga ~— (4) 


which gives the logarithm of the maximum likelihood for a given value of T and 
is a function of ¢ alone. 

It is now desirable to find the value of t which maximizes (4). Ordinarily one 
would differentiate L(¢) with respect to ¢ and set the derivative equal to zero. 
This procedure, however, is inappropriate since ¢ is not continuously vari- 
able. Nor is it a reliable technique to search for a value ¢* of ¢ for which 
L(t*—1) <L(¢*) and L(t*+1) < L(é*) since experience suggests (see Sec. 3) that 
several maxima may exist, and this technique is incapable of distinguishing 
between them. Therefore, the following procedure is recommended: calculate 
the value of the likelihood function (4) for all possible values of ¢ and select 
as the maximum likelihood estimate that value of t which corresponds to the 
maximum maximorum. 

In general, the procedure is therefore as follows: order the observations ac- 
cording to time period (2;, y:),--+ , (zryr) and divide the data into a left hand 
group and right hand group. Estimate separate regression lines for the two 
groups. Then move the point of division between the two groups by one time 
unit to the right and one time unit to the left. Calculate for each of these new 
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divisions separate regression lines for the left hand group and the right hand 
group. Move the dividing line again and proceed in analogous fashion. For each 
division we can find and evaluate an expression of the form of (4). A maximum 
likelihood estimate for ¢ is given by that value of ¢ for which (4) reaches the 
highest maximum. For a given value of ¢, maximum likelihood estimates for 
@, @, b;, bz are obtained in the usual fashion. These estimates are obviously 
also least squares estimates. The determination of two regression lines for a 
single set of data may then shed additional light on the non-linearities of the 
system. 

It is natural to ask at this point whether it is possible to devise a test of the 
hypothesis that no switch occurred during the period considered. The following 
avenues of approach seem possible. 

1. A likelihood ratio test may prove useful in testing the hypothesis that 
no switch occurred against the single alternative that one switch took place. 
The likelihood ratio \ is defined as 


L(@) 

~ -L(Q) 
where L(Q) is the unrestricted maximum of the likelihood function over the 
entire parameter space 2 and L(4) is the maximum of the likelihood function 


over the subspace wC@ to which one is restricted by the hypothesis.‘ Substitut- 
ing for L(@) and L(Q) one obtains 


6,'&,7-* 


6T 


where ¢ is the standard error of estimate for a regression taking into account 
all observations. Under certain conditions,’ the Chi square distribution with 
n—m degrees of freedom is an acceptable approximation to the distribution of 
—2 log \ for large T where n is the dimensionality of 2 and m the dimensionality 
of w.* Although the conditions under which the Chi square distribution is a 
reasonable approximation are not fulfilled here, the conjecture that it might 
still be acceptable for the present purpose is somewhat strengthened by the 
fact that one of the reasons for the nonfulfillment of these conditions is the 
discreteness of t which would tend to cause less significant distortions as 7 
increases.” 

2. A small sample test may be utilized in the following fashion. Given a 
division of 7 observations into two groups, one may test the hypothesis that 
the two regressions are the same.* Denoting by S, the sum of the sums of squares 
of deviations from the two separate regression lines fitted to the two groups 
of observations and by S, the sum of squares of deviations from a common 





4A. M. Mood, Introduction to the Theory of Statistics. New York: McGraw-Hill, 1950, pp. 257-9. 

6 Ibid., p. 211. 

* In the present example n —m =7 —3 =4. 

1 For example, on the assumption that ¢ is continuously variable (i.e., on the assumption that continuous 
sampling is possible) it is more plausible to assert that the derivative of the likelihood function would vanish at the 
maximum which is one of the conditions under which the Chi square distribution becomes acceptable. 

*C. R. Rao, Advanced Statistical Methods in Biometric Research. New York: Wiley, 1952, pp. 112-4. 
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regression based on all observations, the quantity (S,—S,)/S, has the F- 
distribution with 4 and 7'—8 degrees of freedom. 

A difficulty that arises in connection with this test is that the dividing point 
between the left hand and right hand regimes is not given exogenously but is 
presumably a maximum likelihood estimate based on the observations. This 
will tend to make S, smaller than it would be if the value of ¢ were given exog- 
enously and it will also make the variance ratio larger. If the critical values 
of the F-distribution with 4 and T—8 degrees of freedom are used, the pro- 
cedure will result in the rejection of the hypothesis more frequently than would 
otherwise be the case. The reason for this is that the determination of ¢ from 
the data reduces the degrees of freedorn for the denominator and increases the 
degrees of freedom for the numerator of the variance ratio. 

This difficulty could be avoided by not using a maximum likelihood estimate 
for ¢ but arbitrarily deciding upon ¢ such that t=7/2 if T is even and 
t=(T+1)/2 (or (T—1)/2) if T is odd. Although this procedure eliminates one 
difficulty, it creates another in that either the left hand or the right hand re- 
gression is likely to be contaminated with observations from the other regime. 
This will generally impair the power of the test. The optimal procedure may 
well be to eliminate the central observations altogether and consider the left 
hand, regression to consist of the first k(k<(7—1)/2) and the right hand re- 
gession to consist of the last k observations. This procedure will reduce the 
probability of contamination of one regression with observations from the other 
regime. 

The power of both tests clearly depends on how close the true switching 
point ¢ is to either endpoint of the series. The closer ¢ is to either endpoint, the 
less will be the power of the tests. Consider the true regressions A and B in 
Fig. 877. If out of a total of 100 observations the first 3 are generated by B 
and the last 97 by A, the common regression will tend to look like C; if B 


x 











Fia. 877. The positions of hypothetical common regressions as determined by the distance 
of the true switching point from the midpoint of the sample. 
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generates the first 50 and A the last 50 observations, the common regression 
will appear as D. The sum of the squared residuals and the standard error of 
estimate for the common regression will tend to be larger for D than for C. 
Thus the likelihood ratio will tend to be smaller and the variance ratio larger 
for D than for C. This means that the false hypothesis is the more likely to be 
rejected the closer is ¢ to the midpoint of the series. 

The analysis is not affected by the number of explanatory variables in the 
regression. If y=a.+a;%i+ +> - +a,%,+u and y=a,'+a;'%:+ + - + +4,'%,+t2 
where the u’s are normally and independently distributed with mean zero and 
standard deviations o; and o», the likelihood function is of the same general 
form as gefore. The likelihood ratio and the variance ratio remain the same. 
If the Chi square distribution is used as an approximation to —2 log A, the 
relevant degrees of freedom are 2(r+2)+1—(r+2) =r+3. 


3. AN EMPIRICAL TEST 


An empirical test of this line of reasoning was carried out in the following 
fashion. Two true regressions were assumed, namely 


y=2.5+ 72+ w (5a) 
y=5 +.62+% (5b) 


It was assumed that observations (z, y) were available for 20 consecutive pe- 
riods. These observations were generated as follows: (1) the range of z was 
restricted to the integers from 1 to 20 inclusive. The 20 z-observations therefore 
consisted of the numbers 1, 2, - - - , 19, 20. (2) The order of the z values was 
determined by utilizing tables of random numbers.*® The table was read hori- 
zontally from left to right, in two-digit groups. Random numbers higher than 
20 and repeats were discarded. (3) The ordered set of x values was divided into 
a left hand set and a right hand set. The table of random numbers was read 
in one digit groups, one digit corresponding to each number of the entire set of 
zx. If an odd random number was read, the corresponding zx was placed in the 
left hand set, if an even random number was read, it was placed in the right 
hand set. (4) The order in which z’s were placed in the left hand and right hand 
sets was preserved. (5) As a result of this procedure the following left and right 
hand sets of z’s were generated: Left hand set: 4, 13, 5, 2, 6, 8, 1, 12, 17, 20, 
15, 11; Right hand set: 3, 14, 16, 10, 7, 19, 18, 9. Thus the left hand set consists 
of 12 and the right hand set of 8 observations. (6) The expected values of y 
were then calculated, using (5a) for the left hand set and (5b) for the right hand 
set. (7) Normally distributed errors were then added to the thus computed 
y-values by utilizing tables of normal deviates.’® The paired (z, y) observations 
are given in Table 879. 

The problem was to determine the position of the true switching point 
(between the 12th and 13th observations). Separate regressions were computed 
for all possible ways in which the observations can be broken into two groups 
without disturbing their order except that no regression was computed for 
groupe containing less than three observations. On the basis of these calcula- 


* A Million Random Digits with 100,000 Normal Deviates. The RAND Corporation, Glencoe: Free Press, 1955. 
10 Ibid. 
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TABLE 879 
OBSERVATIONS FOR THE VARIABLES FOR 20 CONSECUTIVE PERIODS 








t y 





3.473 
11.555 
5.714 
5.710 
6.046 
7.650 
3.140 
10.312 
13.353 
17.197 
13 .036 
8.264 


CONAanrk wn 





7.612 
11.802 
12.551 
10.296 
10.014 
15.472 
15.650 

9.871 











tions the points of the likelihood function (4) were obtained and plotted except 
for an additive constant. The likelihood function is shown in Fig. 879. It clearly 
reaches a maximum maximorum at the point ¢t=12 which corresponds to the 
true switching point. However, the likelihood function does have two other 
maxima (at i0 and 15) and three minima (at 4, 11, and 13). The irregular be- 
havior of the likelihood function toward the two extremes is not unexpected 
in view of the fact that at the extremes the estimates of the error variance are 
based on very few observations. 





t 


4 n —— 4. 


Se ee a deme 
34567 8 9 10 Il 12 13 14155 16 I7 





Fia. 879. Values of the logarithm of the iikelihood function plotted 
up to an additive constant. 
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This procedure is still recommended even if it is doubtful whether there exists 
a switching point at all. In such a case the likelihood function may be expected 
to show no obvious maximum. The hypothesis that no switch has occurred 
may then be tested against the single alternative of one switching point by a 
likelihood ratio test. The tests suggested in Sec. 2 were carried out for the 
present example. The value of —2 log \ was 14.610. This was compared to the 
critical values of the x? distribution with 4 degrees of freedom. The observed 
value is significant on the 1% level. The variance ratio F is 1.044 which is 
significant only at the .50 level and does not permit the rejection of the hypothe- 
sis. It appears reasonable to assume that the power of both tests will be 
greater the more pronounced the shift in the regression line. 


4. CONCLUSION 


The purpose of this paper was to derive a method of estimating the position 
of a single switching point for a linear regression system obeying two regimes. 
It has been found that the switching point can be estimated most effectively by 
examining the appropriate likelihood function, but the procedure depends upon 
the knowledge that there is at most one switching point. (In case of more than 
one switching point the exact number of switches must be assumed to be 
known.) A likelihood ratio test and a small sample test based on the F distri- 
bution are proposed for testing the hypothesis that no switch occurred against 
the single alternative of one switch. The power of the tests depends upon the 
magnitude of the switch. The procedure was illustrated in terms of a sampling 
experiment in which the true switching point was correctly estimated. 
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ALTERNATIVE DEFINITIONS OF THE SERIAL CORRELATION 
COEFFICIENT IN SHORT AUTOREGRESSIVE SEQUENCES* 


Apsortt 8. WEINSTEIN 
New York State Department of Mental Hygiene 


Various definitions of the serial correlation coefficient are considered 
in relation to alternative methods of estimating autoregressive parame- 
ters from short time-series. It is concluded that estimates based on 
three experimental series considered are affected less by changes in 
definition of the serial correlation coefficient than by changes in the 
basic method of estimation. However, two definitions are shown to be 
dependent on the size of the correction for the mean and are rejected. 
The circular definition is seen to be inappropriate in short series not 
circularly related. A new definition of the serial correlation coefficient 
is introduced and its efficacy is illustrated. 


N A recent paper, methods of estimating autocorrelation and autoregression 
I coefficients from short time-series were proposed by the author [27]. The 
improvement over the usual methods was illustrated. However, in the illustra- 
tion a new definition of the serial correlation coefficient was introduced and 
the empirical efficacy of the method may depend on the particular definition 
of the coefficient used. It is the main purpose of this paper to consider such a 
possibility. 

THE DEVELOPMENT OF APPROXIMATE DEFINITIONS 
OP THE SERIAL CORRELATION COEFFICIENT 


In time-series analysis, methods involving the computation of large numbers 
of serial correlation or regression coefficients are common—and are likely to 
become more so with the increasing use of automatic high-speed computing 
machines. Examples of such methods are correlogram analysis [9, 10], con- 
fluence analysis [6, 22], and sampling studies in methodology [4, 17, 18, 19, 20]. 

Largely as a result of the labor necessary to compute large numbers of 
product-moment correlation coefficients, various approximate definitions have 
been proposed. For a time, such definitions were judged almost solely by how 
well they approximated the ordinary definition [10]. However, the ordinary 
definition, based on autoregressive sequences (e.g. equation 6, below), is biased 
and some of its “substitutes” may even be better [11]. Parenthetically, it is 
open to question whether some of the definitions reduce the computational 
burden substantially. 

Several of the definitions provide adequate estimates of the autocorrelation 
coefficient when applied to long series. In short series, however, all the defini- 
tions that have been proposed are biased. That is, the sample coefficients tend 
to be substantially below their corresponding parameter where the latter is 
high and positive as it usually is for the first two lags in actual autoregressive 
series ([27] and references). 





* The author wishes to express his indebtedness to the late Arthur Hendler of Rensselaer Polytechnic In- 
stitute for his helpful comments. 
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It should be noted that in time-series, “long” and “short” are not strictly 
analogous to “large” and “small” as used to describe samples in frequency 
distribution analysis. In customary frequency distribution analysis, observa- 
tions are independent and it is usually a simple matter to determine the degrees 
of freedom. In time-series, unless the parent structure is known, the effective 
degrees of freedom cannot be determined with any certainty.’ Thus, a time- 
series containing a large number of observations may contain no more useful 
information than a sample consisting of a much smaller number of independent 
observations. 


DEFINITIONS OF THE SERIAL CORRELATION COEFFICIENT 
The ordinary definition of the serial correlation coefficient, r,, is given by 


n—@ 
LD Ftine 
s=1 





, (1.1) 





+ zr? > xe 


i=l t=l+s 


where the z; are a series of variates with zero mean and s is the order of serial 
correlation (length of lag). 
If the series is taken as circular, i.e. if z;,, is taken to be 2, 
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which has been discussed by Anderson [1] and others (e.g. [5, 13, 14, 16, 21]). 
This definition has been used extensively as a testing device in theoretical 
studies. 

An approximation of the circular definition, where the series is not circular, 
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n Leitire 
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- , (3.1) 
(n — 8) > 2? 


i=1 


was proposed by Orcutt [17]. 
Alternatively, since 


22 Lite SE BP + F457 — (Zi — Ti+)’, 


we may rewrite (2.1), 





1 See [2] and di ion. For attempts to estimate the effective degrees of freedom, see Bayley and Hammersley 
(3} and Yule [28}. 
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where the quantity in brackets is von Neumann’s ratio of the mean square 
successive difference to the variance [7, 8, 24, 25]. Where the series is non- 
circular, the definition becomes 


n px (ti — Lope)? 


Bi es ictal 





(n — 8) D2? 
i=l 
which was used extensively by Cochrane and Orcutt [4, 18]. 
Returning to the ordinary definition, the denominator of (1.1) is actually the 
geometric mean of 


52) and >> 2. 


t=1 t=I+s 


The arithmetic mean will be higher, making r, lower. However, where 


» az? and .s. x? 

t=1 t=l+s 
are approximately equal, as is usually the case where s is low, the difference 
should be small. Thus, we have 
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which is more easily computed than (1.1) and was used by the author [27]. 


END-EFFECTS 


In correlating two different variables z and y, each 2z;, i=1, 2,---, n, is 
paired with the corresponding y;. Each z; and y; is represented once in the 
numerator of the formula for r and each zx? and y? is represented once in the 
denominator. In correlating z; and 2z;,,, however, the variates are not repre- 
sented equally. Where r, is defined according to (1.1) for example, the first and 
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last s terms are used once each as factors in the numerator and as squares in 
the denominator. The middle n—2s terms are each used twice. Where n=3 
and s=1, 
Like + Irs 
V [ax? + 22°) [z2* + 257] 
in which z,; and z; each have half the weight of 22. 
In (1.1), (4.1) and (5.1) it is assumed that 


> 22 and ; z? 
t=1 


t=l+e 





"1 


are approximately equal. Where the difference between 


; 2? and DR z;? 

i=l t=—n—s+1 
is great and either quantity is numerically large, this condition will obviously 
be violated and estimates according to the various definitions may be in poor 
agreement. 

The problem noted in the preceding paragraph will be completely avoided 
if the series is treated as circular, i.e. if r, is defined according to (2.1). However, 
where there is no real relationship between (e.g.) z; and 2a, 1/n Derive will 
be smaller than the true covariance of 2; and 2;,,; since E{z7?} is equal to the 
population variance, r, will be less than p,, the parent correlation coefficient. 
It has been suggested by Quenouille [19] that the transition from x, to 2241=21, 
etc. be smoothed artificially, but this was intended for long series. Applied to 
most actual series where n is quite small, artificial smoothing would rob the 
coefficients of much of their meaning. Watson and Durbin [26] have proposed 
noncircular statistics for which exact distributions can be obtained from 
Anderson’s [1] results for the circular definition. However, these sacrifice some 
relevant information by omitting central pairs and are of questionable validity 
in very short series. Quenouille [19] has proposed other definitions of the serial 
correlation coefficient in which the end-terms are weighted fractionally. How- 
ever, these involve greater computational labor than the approximate defini- 
tions and they are not considered here. 


CORRECTION FOR THE MEAN 


It has been shown by Orcutt [17] and discussed by Quenouille [19], Marriott 
and Pope [15] and Kendall [12] that the serial correlation coefficient is badly 
biased where the parent mean is unknown and must be estimated from the 
sample. This bias can be explained easily by using (4.)) with the series of ob- 
served variates, X;=2,+X: 
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Note that while the numerator is independent of the mean, the denominator 
is not. If the sample mean, X, differs from the true mean, the denominator will 
be smaller than it should be, since the deviations around X are minimized. 
Thus, the quantity to the right of the minus sign will be biased upward and 
r will be biased downward. 

The definitions (1.1) through (5.1) are based on the assumption that the true 
mean is zero. Where the parent mean is not known, the corrected definitions 
(1.2) through (5.2) correspond to (1.1) through (5.1) respectively: 
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2 The corrected definition (3.2) is only approximately equal to Orcutt’s (17] in which the numerator is n S72 /(X; 
-%) (x, i+e -X ). 
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(5.2) 
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Again, the problem is complicated by end-effects. The correction for the 
mean may increase the bias in (1.2) where X,’ and X,” are appreciably dif- 
ferent, since both are in effect estimates of a single “true” mean. This bias can 
be roughly offset by using a single estimate of the mean, as in (2.2) through 
(5.2), but another problem is introduced in (3.2), (4.2) and (5.2). If X, and X,, 
for example, are much larger than the single estimate of the mean, X, the 
latter will be greater than X,’ and YX,” and, when squared, will seriously over- 
correct. 

An important property of a dependable corrected definition is its independ- 
ence of the size of the correction. It is easily shown that (3.2) and (5.2) are 
highly dependent on the relative size of the corrections, so unless the mean is 
fairly small, these definitions cannot be recommended. This will be illustrated 
below. 

The correction can be made quite accurate if the mean is estimated by 
averaging X,’ and X,”, as in the new definition, (5.3): 

n—s » > Hag 2 
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ILLUSTRATION OF COEFFICIENTS ACCORDING TO VARIC('S DEFINITIONS 
To illustrate the effect of the definition on estimates of autocorrelation co- 
efficients, three experimental series, the parameters of which are known, have 


been employed. Specifically, we shall refer to Kendall’s [10] series numbers 1, 2 
and 3, constructed according to the linear autoregressive scheme, 


Lise + regs + oer: = Vex2, (6) 


in which »; is a random disturbance with zero mean and the ?’s are equidistant 
points in time. », was taken from tables of random numbers and adjusted to 
range from —49 to +49. The series were constructed with the following a’s:? 








Series No. 











1 
2 
3 








3 Schemes other than (6) and a's other than those considered here are possible and will be examined in a future 
study. See, for example, Orcutt [17] or Cochrane and Orcutt [4]. 





SERIAL CORRELATION COEFFICIENT 


The true values p; and ps are: 








Series No. Pl pz 





1 .733 -306 
2 .850 -606 
3 -615 — .217 











As discussed in [27] we want to estimate the constants for annual series 
represented by (6). If we have, say, only 14 years of observations, the estimates 
will be badly biased. To overcome the bias, we use all the monthly observations 
to provide 12 annual series of 14 observations each, the first series consisting 
of the 14 January observations, the second, of the 14 February observations, 
etc. It is assumed that each of the 12 series has the same parametric structure. 
The rationale for this assumption is given in [27]. 

In order tc make Kendall’s experimental series analogous to actual series 
covering 14 years of monthly‘ observations, the first 168 terms in each series 


TABLE 887 


r, AND rz: ACCORDING TO SPECIFIED DEFINITIONS 
KENDALL’S SERIES NO. 1, e:1=+.733 AND p2:=+.306 








n 





2.2 8.2 4.2 





-610 .677 .602 


-468 .445 
-052 .252 .261 
-201 .270 .299 
-758 .904 
-492 .620 .655 
-432 .540 
-238 .428 
-462 .587 
-547 531 
-465 


omnoaraonr 


-628 .669 
441 .543 


548 
-647 
-461 


-562 
-629 
.538 


-620 
-503 
- 562 
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have been broken into 12 segments (“months”) of 14 terms (“years”) each. The 
first segment consists of the first 14 terms of the original series; the second, of 
terms 15 through 28; etc. Series analogous to annual averages were constructed 
by adding the corresponding terms of the 12 segments. That is, the terms of 
the “average” segment are 


12 
XxX; = 1 Xi 


j=l 


where i refers to the year and j to the month. Actually, the segment consists of 
totals rather than averages. This saves a step in the computation without 
affecting the coefficients. The serial correlation coefficients for each segment 
and specified combinations of segments, according to the various definitions, 
are shown in Tables 887, 888, and 889 for series 1, 2, and 3, respectively. 


ESTIMATES OF AUTOREGRESSIVE PARAMETERS 


Estimates of the autocorrelation coefficients and corresponding autoregres- 
sion coefficients are shown in Table 890 for each of the definitions of the serial 
correlation coefficient considered above. The estimates are based on the average 
segment, the mean of the serial correlation coefficients of the 12 segments, the 


TABLE 888 


r, AND rz ACCORDING TO SPECIFIED DEFINITIONS 
KENDALL’S SERIES No. 2, p:=+.850 AND p:= + .606 
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single long series and the author’s methods I and II. For the development and 
description of these methods of estimation (methods I and II), see [27]. 
Estimates of the autoregressive constants, a; and az, were made using 


n(1 — r2) ry? — Te 
a, ey" and @ * 7 am (7) 
in which the a’s are estimates of the a’s and the r’s are estimates of the p’s. 

In Table 890 the estimates are compared with the respective true values of 
the coefficients. Of the various methods of estimation, Method I provides the 
most useful results for series no. 1. For series 2 and 3, the combined series of 
12 segments uncorrected for bias and Method II give estimates nearest the 
truth. 

As would be expected from the discussion in the section on “End-Effects” 
above, the circular definition (2.2) is clearly inappropriate in these examples. 
Considering only the other definitions, the estimates appear far more sensitive 
to the particular method of estimation than to the definition of the serial cor- 
relation coefficient employed. In the present examples, definitions (4.2), (5.2) 
and (5.3) compare very favorably with the ordinary definition, (1.2). 


TABLE 889 


r, AND rz ACCORDING TO SPECIFIED DEFINITIONS 
KENDALL’S SERIES NO. 3, o:=+.615 AND p2:= —.127 
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EFFECT OF SIZE OF THE MEAN ILLUSTRATED 


To show the effect of the size of the mean on the corrected coefficients as 
variously defined, the coefficients have been computed from the total (average) 
segment of Kendall’s series No. 1: (a) with zero mean, (b) as developed orig- 


TABLE 890 


ESTIMATES OF AUTOCORRELATION AND AUTOREGRESSION CO- 
EFFICIENTS ACCORDING TO SPECIFIED DEFINITIONS 
OF r, AND METHODS OF ESTIMATION 
KENDALL’S SERIES NO. 1, 2 AND 3 





Kendall's Series No. 1 Kendall's Series No. 2 Kendall's Series No. 3 





nm a mn Tm a rm 














Population Values 





—1.1 0.5 | .850 .6€06 -2.2 0.4 | .615 





Average Segment 





840 «6.43880 «6-1.60 91 
-856 .549 -—1.44 
787 1.78 .01 
-860 .542 -1.52 .76 
-623 -—1.35 .55 
-834 .425 -1.58 .89 





re of 12 Individual Segments 





-46 | .710 = .350 93 31 
-33 | .658 .288 83 .26 
-38 | .712 .348 94 .32 
-41) .713 = .365 -92 .22 
44) .696 .275 98 «Al 
-46 | .701 = .324 -93 33 











Combined Series of 12 Segments 





-42/| .853 .628 -—1.16 
-30 | .812 .583 — .99 
-37 | .870 .657  —1.22 
-40| .853 .631 —1.16 
-42| .856 .640 -—1.15 
-42/| .853 .626 -—1.17 


S288x8 





Method I 





-927 .744 
-906 .748 
-943 «=. 769 
-922 .738 
-941 = .821 
-905 .756 


BRRSRR 





Method II 





-873 =.682 

-831 

895 . =I. 
872 .682 -1. 
-876 § .696 

873 .681 --1. 
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inally by adding corresponding terms of the 12 segments (X = —67) and (c) 
with 400 added to each term of (b) (X =333). 


The three cases follow: 








1 2 3 4 5 6 7 8 9 10 11 12 13 14 





— 34 -—139 — 92 47 207 188 118 201 -15 -253 -172 -110 ~-16 72 
—101 -—206 -159 -20 140 121 46 134 -82 -320 -239 -177 -—83 5 
299 194 241 380 540 521 446 534 318 80 161 223 4317 405 











The coefficients, according to specified definitions of r,, are: 








dint rT; T: 
Definition 





(a) (b) (c) (a) (b) (c) 





.628 -628 -628 -081 -081 -081 
-610 -610 -610 -039 -039 -039 
-667 -677 -617 -086 .053 .251 
-602 -602 -602 — .025 — .025 — .025 
-627 -630 .608 .078 -049 .197 
-626 -626 .626 .077 .077 -077 











Note that the estimates according to definitions (3.2) and (5.2) are altered by 


changes in the size of the mean. Unless the mean is relatively small, these 
definitions should be avoided. Note further that definition (3.2) produced r’s 
greater than +1 for segment 9 of series no. 2 (Table 888). 


CONCLUSIONS 


1. In the present cases, the estimates are far more sensitive to the particular 
method of estimation than to the definition of the serial correlation coefficient 
employed. The differences in estimates due to using definitions (4.2), (5.2) or 
(5.3) are generally small. 

2. Definition (5.2) is not independent of the size of the mean, so unless the 
means are quite small, it should not be used. Instead, the use of (4.2) or (5.3) 
is recommended. 

3. The circular definition is inappropriate for short series not circularly 
related. 
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A STOCHASTIC ANALYSIS OF THE SIZE 
DISTRIBUTION OF FIRMS* 


Inma G, ADELMAN 
Mills College 

In the first section of this paper, the probabilistic method of Markov 
is applied to the analysis of an industrial structure. The concept of 
dynamic equilibrium for a group of firms is introduced, and a technique 
for exploring the characteristics of this state is presented. The applica- 
bility of this method is then illustrated by using it to investigate trends 
with respect to the concentration and mobility of firms in the iron and 

steel industry in the United States. 


, 
1, INTRODUCTION 


HE forces determining the distribution of firm sizes within a particular 
i cada are so varied and so complex that any theoretical attempt to 
portray the effects of their interactions must of necessity be either drastically 
simplified or else hopelessly complicated. On the other hand, since a major 
portion of the literature in industrial organization is devoted to a study of the 
relationship between market structure and firm behavior, even a simplified 
model for predicting the equilibrium composition of an industry may not be 
without interest. 

Our primary purpose in this paper is to adapt the probabilistic method which 
is due to Markov' to the analysis of the structure which a given industry would 
eventually reach if certain current trends were to continue. This probabilistic 
approach was first applied in economics to the analysis of income and wage 
distributions.? More recently, the same technique was also employed by Hart 
and Prais* in an investigaton of business concentration. In their article, Hart 
and Prais presented matrices of transition probabilities for firms in British 
industry. But, they did not proceed to derive an equilibrium market structure 
for manufacturing because, as they stated,‘ of the difficulty of realistically 
handling the phenomena of entry and exit of firms from the industry. How- 
ever, this obstacle can be overcome. As a result, it becomes possible to inves- 
tigate what shape the equilibrium size distribution of firms would assume were 
past tendencies to persist. In this process, a different, dynamic concept of 
equilibrium is introduced. Also, in addition to the derivation of the implications 
for concentration, a measure of firm mobility is constructed along the lines sug- 
gested by Prais*® in a paper on social mobility. 

In view of the inherent importance of steel in the economy of the United 
States, as well as the preponderance of its oligopolistic form of organization 





* The author is indebted for valuable comments to R. Caves and 8. Sosnick, as well as to the referees. 

1 For exceilent discussions of Markov processes see W. Feller, An Introduction to Probability Theory and Its 
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pp. 56-66. 


893 








894 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1988 


within our society, we shall use the suggested technique in order to describe the 
ultimate size composition and concentration of the steel industry in this coun- 
try. It will be seen that the configuration we find for this case is not at all 
unreasonable and is indeed in accord with the opinions of experts in the field. 


2. THE DERIVATION OF THE EQUILIBRIUM STRUCTURE FOR AN INDUSTRY 


We shall assume that the firms which comprise an industry are grouped ac- 
cording to some criterion of size into a number of classes. We then regard the 
evolution of a corporation through these classes as a stochastic process, in 
which the probability per unit time of movement from one group to another 
is a function only of the two groups involved. That is to say, the likelihood 
that a corporate entity will advance a given number of steps during a period 
depends solely upon its size at the beginning of the period and the number of 
steps involved, and is independent of the previous history of the firm. In our 
approach, then, the growth of an enterprise is statistical in nature, with ab- 
solute size as the determinant of development. 

Obviously, this model constitutes a considerable simplification. For we 
represent all those economic forces which determine the growth pattern of 
business organisms within a given industry by a single portmanteau variable— 
corporate size. This is tantamount to the assumption that such economic fac- 
tors as entrepreneurship, financial structure and position, proneness to intro- 
duce technological change, economies of scale, and profits are all strongly cor- 
related with size. Or, perhaps, that the magnitudes and behavior of the growth- 
promoting variables are more nearly homogeneous within a specified size group 
than they are from stratum to stratum. 

Another simplifying assumption is that the effects of the interactions among 
all these variables, which are summarized in our model by several size-depend- 
ent transition probabilities, are taken to remain invariant throughout the 
evolutionary process. While this is a strong restriction, it is analogous to that 
used in long-run comparative statics: that the forces which operate during the 
sample period will continue unchanged until equilibrium is reached. Actually, 
if the time period over which the transition probabilities are evaluated is suf- 
ficiently long and includes at least one complete business cycle, the use of this 
approximation may be expected to lead to qualitatively correct conclusions. 

Under these conditions, the historical development of the distribution of firm 
sizes in a given industry can be described by a process which is due to Markov. 
Basically (and in mathematical terms, for a moment), one arranges the transi- 
tion probabilities into a square matrix. By operating with this matrix upon a 
vector which represents the structure of the industry at the beginning of one 
period, one derives the s*ructure for the next time interval. Repeating the 
process indefinitely leads (under one further restriction) to a vector which de- 
scribes the equilibrium state. 

There is, however, one modification which must be made in the Markov 
process before we can use it profitably in our work—we must provide for entry 
into and departure from the industry. To do this, we add to our m size classes 
a large additional group which acts as a reservoir of potential entrants into the 
system. We then assign as the probability of moving from this zeroth group to, 
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say, the jth group a value just sufficient to make the average number of firms 
entering the jth class per year correspond to the actual number of new firms 
started annually in the appropriate size range. Similarly, the failure of a firm 
will be represented as a movement into the zeroth class. 

We now let p,; denote the probability that a firm in size class i will, during 
the next period, enter group j. Thus, for example, p13 will represent the likeli- 
hood that, in one time period, an enterprise of magnitude 1 will grow suffi- 
ciently to be included in class 3; similarly, p20 will stand for the probability that, 
during a unit of time, a firm in size range 2 will go out of business for any reason 
whatever. 

Using this notation, we arrange the transition probabilities p;; into a matrix 
which may be written as 


Poo Po Po2z* * * Pom 
Tio Pu Pi2** * Pim 
[P] = 


Pmo Pm Pm2 ghtvbx Pmm 


Since each element of this matrix is non-negative, and since 


Dd pis = 1 (1) 

j=0 
for each 7, the matrix [P] is a stochastic matrix.* But we must place on additional 
restriction upon the shape of [P] before we can determine the ultimate size 
distribution of firms in an industry: we require that all states be accessible. In 
other words, a firm starting in any class i must have a non-zero probability of 
moving into any other group j in a finite number of periods. In this event, [P] 
will be a regular stochastic matrix,’ and we may use directly a number of 
theorems which have been proved elsewhere concerning such matrices. 

One such theorem states that there exists an equilibrium solution to the 
Markov process.* Furthermore, it has been shown that this equilibrium is 
unique and independent of the initial configuration.’ That is, the repeated ap- 
plication of the set of transition probabilities represented by [P] will cause any 
initial distribution of firm sizes to approach this unique equilibrium state. 
Under the above assumptions, then, an industry—regardless of whether it was 
originally in a purely competitive state or in an oligopolistic one—will, given 
the same transition probabilities, assume the same ultimate type of organiza- 
tion. This result is, of course, merely the logical implication of the economic 
assumption that the subsequent development of a firm is independent of its 
past history. 

Before we derive the form of this equilibrium solution, however, we must 
examine in more detail the meaning of equilibrium in a Markov process. An 
equilibrium structure in this model may be defined as that distribution for 

* See Kemeny, Snell and Thompson, op. cit., p. 217. 
1 Tbid., p. 220. 


§ Loe. cit. 
* Tbid., p. 221. 
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which the average number of corporations entering a given stratum per period 
equals the average number of businesses leaving it. Our concept of equilibrium 
is thus statistical in nature for the industry, and dynamic for the individual 
firm. In other words, equilibrium in this paper does not imply that there is no 
movement of enterprises between strata. On the contrary, the stochastic con- 
ception of equilibrium explicitly requires that firms move in and out of each 
class. But on the average, the forces acting to increase the number of enter- 
prises in a given size range are exactly counterbalanced by those tending to 
decrease it. 

As stated above, the equilibrium solution may be derived, at least in prin- 
ciple, by the repeated application of [P] to any initial distribution vector. 
Symbolically, if the structure described by a row vector 


(8;") = (80%, 1", - + + 8m"), (2) 


the components of which represent the proportion of firms in each class at that 
time, the configuration after the next time step may be found from 


(3*)- [P] = (8,"*"). (3) 
By successive substitutions, one may write 
(8")-[P]}>+ = (s;**4). (4) 


But to find the equilibrium vector by multiplying [P] by itself a large number 
of times is, in the general case, a tedious process. A simpler approach is to make 
use of the fact that, in equilibrium, the distribution of enterprises among 
strata must be invariant. That is, for the equilibrium vector (t;), we may re- 
write (3) as 

(t;)- [P] = (). (5) 


Since (é;) is an (m+1) component vector, and [P] is a square matrix, (5) pro- 
vides us with a set of (m+1) equations, from which it would appear that we 
can derive the (m+1) components of the vector (¢;). However, since (t;) rep- 
resents a relative distribution, we must also have 


oa t; =]. (6) 


We now have (m+2) equations (5) and (6) in (m+1) unknowns. But it can be 
shown that (any) one of the equations (5) is not linearly independent of the 
others and therefore that any one of these equations may be dropped from 
the system. Thus we are left with a set of (m+1) linearly independent equa- 
tions (if our assumption about accessibility is satisfied) in (m+1) unknowns, 
from which we can evaluate the equilibrium structure of the industry. 


3. AN INDEX OF INDUSTRIAL MOBILITY 


The stochastic matrix [P] may be utilized further in order to construct an 
index of corporate mobility analogous to the index of social mobility of Prais.'® 
Intuitively, it would appear reasonable to express the concept of mobility in 


10 §. J. Prais, op. cit., pp. 58-63. 
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terms of the average period a representative firm remains in the same size 
interval. For the more fluid the structure of an industry, the shorter will be the 
time before a typical corporation will move from one stratum to another. But 
the specification of an index requires also a reference situation to which a given 
industrial complex can be compared. We therefore take as our measure of 
corporate mobility the ratio of the average number of years spent in a class in a 
perfectly mobile industry (to be defined below) to the corresponding quantity 
for the industry in question. 

The mean life time L, of a corporation in the jth stratum may be found by 
noting that the total time spent in an interval by all the s; firms originally in- 
cluded therein is given by 


T; = 8° + 8)°pji + 8pivF +---. (7) 
Therefore, the average firm will remain in the jth level for a period 
T; 1 
Bf EE FO Oe ee (8) 
8; 1 — pi 


Before we can write down our index of corporate mobility, however, it is 
necessary to evaluate the lifetimes for an industry in which movement is un- 
inhibited. In general, (4 la Prais)" we define a perfectly mobile corporate struc- 
ture as one for which the probability that a firm will move from class A to 
class B during a single period is independent of A. With this definition, each 
column of the transition matrix [P] for a perfectly mobile industry of m size 
classes is composed of m identical positive numbers, and as usual, the sum of 
the elements of each row is unity. 

There are, in principle, an infinite number of perfectly mobile industries 
which may be used for comparison. Of these, there is precisely one whose 
equilibrum structure is identical to that which will be reached by our particu- 
lar group of firms. This perfectly mobile industry, which we choose for our 
standard of mobility, has the transition matrix 








to th eee ¢. 

[T,] = om. ei i (9) 
to i s6 @ i. 

The index for industrial mobility for time n may then be written as 
$Y 
jo 1 — bj 
[x = ——_—_—. (10) 

m 8;" 
jo 1 — Dj; 


4. THE STEEL INDUSTRY 


We shall now illustrate the procedure of the preceding sections by applying 
the assumptions and tools developed there to the derivation of the consequent 
equilibrium structure of the steel industry in the United States. For the pur- 





u 8. J. Prais, op. cit., pp. 59, 61. 
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poses of this study, this industry was defined as that group of enterprises with 
assets exceeding $1 million, whose major activity consists of the production of 
pig iron, steel ingots and basic steel shapes; the registry of these firms was com- 
piled from Moody’s Manual of Indusirials.* Two considerations led us to select 
this sector for our investigation. First of all, the steel industry is of basic im- 
portance to the nation’s economy. Secondly, the present market structure 
within this field of activity is fairly typical of that of industry in general. For, 
as is well known, the current form of organization of the steel industry is 
oligopolistic in character, consisting of less than five dominant firms together 
with a large fringe of smaller enterprises. This latter circumstance suggests, in 
addition, that the nature of the equilibrium configuration predicted for steel 
might well be of interest as an indication of equilibrium tendencies for United 
States manufacturing industry. 

We therefore proceeded to derive the shape predicted by our model for the 
equilibrium size distribution of firms engaged in the manufacturing of iron and 
steel in the United States. A first step in this direction, was the quantification 
of the matrix of transition probabilities [P]. 

Before we could accomplish this task, however, we had to select an index of 
corporate size. Our choice was the dollar value of a firm’s total assets, as listed 
in Moody’s Manual of Industrials. This indicator was preferred over the ob- 
vious alternatives (such as, e.g., the number of employees, the dollar value of 
sales, or net value added) primarily because it was easier to obtain this type of 
data. However, the choice of index of size is not crucial to the applicability of 
the technique. Furthermore, it would be reasonable to expect that all indices 
of size would, in practice, be highly correlated. 

The years selected for our study of the steel industry were 1929-39 and 1945-— 
56. These two decades were chosen in order to have a fairly long, cyclically well 
balanced period, for which reasonably homogeneous and reliable statistical 
data would be available. The war years were omitted from our investigation in 
order not to bias our results. 

Next, we divided the continuous scale of firm asset sizes into seven discrete 
ranges. Two problems emerged in fixing these ranges. First, we would expect 
that a firm’s ability to change its (asset) size during a given period would be 
related to its initial size. Larger firms would be likely to grow by greater ab- 
solute amounts than smaller ones. Hence, the class intervals were constructed 
so that their absolute width was greater for large than for small enterprises. 
Secondly, we were faced with the problem of statistical deflation. For, it would 
appear unreasonable to assume @ priori that a prewar firm with 10 million 
dollars’ worth of assets is necessarily equal in magnitude to a 10 million dollar 
post-war enterprise. To achieve greater inter-period comparability, then, we 
required that the class limits of a given size range represent the same per- 
centage of the industry’s total assets in both 1934 and 1950, the midpoints of 
the two time periods. Therefore, since the ratio of the value of industry assets in 
1934 to their value in 1950 was .60, we were led by the above considerations to 
the choice of class intervals given in Table 899. 





12 Moody's Manual of Investments, Industrial Securities (New York: 1930-1957), 
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With this selection of asset ranges, we traced the year to year growth pattern 
of each domestic steel firm in terms of its movements within the class intervals 
0 through 6. In this connection, mergers were treated in a manner analogous 
to outright sale—i.e. as the disappearance of one enterprise and the aggrandise- 











TABLE 899 
CLASS INTERVALS 
Class Limits Class Limits 
Class Name 1929-39 « 1946-56 
(in millions of dollars) (in millions of dollars) 
0 0 .599 0- .99 
1 6 - 5.90 1- 9.99 
2 6 -— 29.99 10- 49.99 
3 80 - 59.99 50- 99.99 
4 60 -— 299.99 100— 499.99 
5 300 - 599.99 500- 999.99 
6 600 -2999.99 1000-4999 .99 











i 


ment of another. Generally speaking, it was assumed in this connection that 
it was the smaller firm who lost its corporate identity. Now, if a;; denotes the 
number of movements of firms from class 7 to class 7 throughout the period 
under consideration, our transition probabilities p,; become 


aij 


y aij 


j=0 





(11) 


Dis = 


Of course, if a corporation stayed in the same asset class j during two successive 
years, this event was treated as an observation of type a;; and its probability 
denoted by p;;. Naturally, expression (11) represents merely an empirical rela- 
tive frequency approximation to the true probability which is inherently a 
limiting concept. 

The above definition of p;;, however, still leaves the probabilities po; arbi- 
trary. For, since, by the very nature of the case, no data on the number of 
businesses retaining the status of potential entrants could be collected, aoo 
could not be evaluated empirically. This deficiency was remedied by assuming 


that a0;= 100,000. Our choice of number was guided by the desire to keep 
i=0 


the reservoir of incipient enterprises large by comparison with the number of 
corporations actually within the industry. But this arbitrary selection does not 
affect the economically relevant portion of our results." 

The statistics upon which our p,;; were based were considerably better than 
might be inferred from the fact that the average number of firms engaged in 
steel production was around 100. For, during the period of our investigation, a 
typical firm experienced 21 transitional movements" of the type a,;. Conse- 





8 See footnote 16 for proof. 
“4 The transitional mov ts of firms between 1959 and 1946 were not included in our data. 
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quently, the total number of observations underlying the quantification of the 
pi; was almost 2100. 

Given the computed values of the p,;, the resulting matrix of transition 
probabilities for the steel industry in the United States becomes 


.99942 -00040 -00016 -00001 -00001 0 
-021 911 -068 0 0 0 
.024 -039 -908 -028 -001 0 
0 0 -076 .872 -052 0 
008 0 0 -016 -947 -028 
0 0 0 0 -037 -926 
0 0 0 0 0 .024 








An examination of this matrix [P] reveals several interesting facts. First of 
all, as might have been expected, the most probable outcome for each firm 
is that it will remain in the same class interval. For, the diagonal terms p;; are 
uniformly very much larger than the other p,; (for 717). Secondly, our observa- 
tions bear out Marshall’s dictum “natura non facit sa!tum.” For those firms 
which survive generally move up or down one asset range at a time. Of course, 
this conclusion is weakened by the fact that, in view of our selection of class 
intervals, it would require an extraordinarily high rate of growth for a firm to 
move from class j to class (j+2) in a single year. Thirdly, entry occurs pre- 
dominantly into the lowest two asset ranges. During the entire period of our 
investigation only one $200 million firm (Kaiser Steel, in 1951) and one $70 
million firm (M.K. Porter Co., in 1955) were formed. On the other hand, failure 
of small firms would appear to be considerably more probable than that of 
large ones, with the probabilities of failure approximately uniform for firms 
possessing assets of less than $50 million. Indeed, no firm of size 3 or larger 
failed during the 23 years of our study, inasmuch as the two firms" with assets 
of approximately $100 million which disappeared in the pre-war years did so as 
a result of mergers. This may be due to the fact that large firms merely reor- 
ganize. 

Finally, we are in a position to find the equilibrium configuration of firm 
sizes predicted by our model for the steel industry. For, if we substitute our 
experimentally derived [P] into relationship (5), and replace the fourth equa- 
tion of (5) by (6) as explained above, we obtain a set of 7 independent simul- 
taneous equations which can be solved for the equilibrium values of the ¢; 
(j=0, ---, 6). The latter, of course, represent the relative frequency distri- 
bution of firms among our asset strata, in the equilibrium state. With this 
procedure, our set of simultaneous equations becomes 


-00040 —.089 -039 0 0 0 
-00016 -068 —.092 .076 0 0 
.00001 0 -001 .052 —.053 .037 - (13) 


— .00058 -021 024 0 .008 0 07 
j © 
0 


0 0 0 0 -029 —.074 .024 
0 0 0 0 0 037 — a 
1 1 1 1 1 1 1 




















eh 


% These firms were Central Alloy, which consolidated with Republic Steel in 1930, and Tennessee Coal which 
was acquired by U. 8. Steel in 1935. 
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The first six equations in (13) come from (5) and the last is equation (6). This 
system can easily be solved to yield the equilibrium vector 


t = (.9480, .00938, .01169, .00376, .00903, .00708, .01091). 


But, since our stratum 0 was entirely arbitrary, the relationship between the 
proportion of firms in this category and the rest of the industry is uninteresting. 
A more meaningful picture can be obtained if we consider solely the relative 
distribution of firms actually active in the industry. To do so, we need only 


normalize our results so that dt= 1. The consequent equilibrium distri- 
1 


bution of firms in the industry is then summarized in the last two columns of 
Table 901, along with that actually observed in 1929 and in 1956. Note that 
this distribution is independent of the choice of the number of firms in the 


TABLE 901 
ASSET DISTRIBUTION WITHIN THE STEEL INDUSTRY 








1929 1956 Equilibrium 





Stratum 
% Firms % Assets? | %Firms % Assets? | %Firms % Assets? 





25.00 1.15 27.68 -98 18.09 

43.47 8.64 39.29 6.79 22.55 

16.30 9.49 8.93 4.36 7.25 

11.96 24.70 16.96 21.90 17.42 . 
1.09 5.42 5.36 26 .86 13.65 12.54 
2.18 50.60 1.79 39.11 21.04 82.14! 








100 .00 100 .00 100.00 100.00 100.00 100 .00 











* Source of data: disaggregated data from Moody's Manual of Industrials. 
> This col was puted on the assumption that the mean firm in each stratum will possess the same assets 
in equilibrium as it did in 1956. 





zeroth class. As is evident from this table, our results indicate that con- 
siderable growth in the size of the median firm might be expected. For, while 
both in 1929 and 1956 the median occurred in the second stratum, in equilibrium 
the median will be at the beginning of the fourth asset range. And, if we apply 
the 1956 class limits to the equilibrium state, our model predicts that the 
median steel firm existing at that time will possess $150 million worth of assets, 
as compared to 30 in 1956 and 22 in 1929. It should be noted, however, that 
the figure of $150 million represents a minimal estimate, inasmuch as it was 





% The proof of this statement can be seen readily from the schematic solution of the equations (9) for t; by 
determinants. Since each of the elements of the first column of the determinant of coefficients (A) except the last is 
inversely proportional to the number of firms (N) assumed to be in the seroth group, will be a function of N. But 
Ate will be independent of N, as Ate is the determinant obtained by replacing the first column of A by a column 
whose elements are sero except for the last (which is unity) and no other elements of Ato depend on N. Thus to 
=Ato/A will be a function of N. The rest of the quantities At;(j >0) will be all proportional to 1/N, since the reduction 
of the 7 X7 determinant to a 6 X6 determinant by cancelling out the jth column and the 7th row leaves a deter- 
minant in which each of the elements of the first column is proportional to 1/N. Thus, for i, j both greater than zero, 

tA 


yh 





is independent of N. 
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based upon the assumption of no further growth of the industry as a whole 
after 1956. Qualitatively, at least (and this is all that is hoped from this study) 
this conclusion would appear to be reasonable. In fact, students of the steel 
industry have generally been of the opinion that a marked increase in corporate 
size coupled with an augmented degree of both vertical and horizontal in- 
tegration might be anticipated on technological grounds. Interestingly enough, 
the application of our technique to the data reveals the existence of forces 
operating in this direction, in spite of the fact that, in real terms, no growth 
in median firm size has actually taken place between 1929 and 1956. 

Secondly, while our conclusions with respect to the degree of concentration 
in steel must be qualified by the fact that our study provided no opportunity 
for a corporation to expand beyond asset class 6, they point towards a decrease 
in the degree of concentration prevailing in this industry. For, if we assume 
that the mean value of assets in each class interval is the same in equilibrium 
as it was in 1956, the number of firms holding 50% of industry assets predicted 
for the equilibrium state is 13, as compared with 2 in 1929 and 4 in 1956. Fur- 
thermore, with the same hypothesis, we obtain the dotted Lorenz curve of 
Fig. 902 for the stationary state, as compared to the two solid lines for 1929 
and 1956. Thus, an increase in the degree of competition would appear to be 
foreshadowed for the steel industry. But our assumptions would tend to lead us 
to underestimate the equilibrium degree of concentration. 

As explained in section 3, our data may also be utilized for the investigation 
of trends in the mobility of firms in the steel industry. The second column of 


Table 903 presents the average number of years spent by a representative cor- 
poration in each asset range; the entries in this column were computed by 
applying equation (8) to the appropriate probabilities in transition matrix [P]. 
Note that the mean lifetime of firms in the zeroth class was omitted from our 
table. This procedure was adopted in spite of the fact that it would be tempting 





100 
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1956 
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Fie. 902 
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TABLE 903 
LIFETIME AND MOBILITY INDICES IN THE U. 8. STEEL INDUSTRY 








Mean Lifetime—L; (years) Distribution of Firms 





3890-198 1956 Equilibrium 





Reina & 
RRSRAS 





Index of 
Mobility _— 











Source of data: See text. 

* Omitting the war years 1940-4 
to interpret such a figure as an indication of barriers to entry, because poo 
(and hence Zo) is directly dependent upon the arbitrarily selected magnitude 
of the zeroth asset range. For the same reason, our computations of the L; for 
the perfectly mobile industry (see Table 903) were based upon a modified 
matrix [x] from which the zeroth class interval was excluded, and the t; 
normalized accordingly. 

The substantial equality of mean lifetimes for all firm sizes in a perfectly 
mobile industry is interesting. However, since the width of our class intervals 
strongly conditions the numerical values obtained for the L;, it is only the ratios 
of the mean lifetimes in actuality to those in a corresponding perfectly mobile 
industry which are of economic interest. These ratios indicate that the giant 
vorporations are considerably less mobile than the rest of the firms. The staying 
power of the giants, however, may be overestimated due to our failure to pro- 
vide a seventh class into which these firms might grow. 

The bottom row of Table 300 consists of indices of mobility for 1929, 1956 
as well as for the equilibrium state. These indices were, of course, computed by 
applying our definition (10) to the data in the table. Generally speaking, the 
mobility of firms in the steel industry during the sample period was approxi- 
mately 10% of that in a perfectly mobile corporate structure. Furthermore, 
these indices would tend to indicate the existence of a consistent trend for a 
decline in the mobility of steel producers. This tendency would appear to be a 
consequence of the decrease in concentration. For, the presence of a larger 
proportion of all firms in the least mobile uppermost class interval entails both 
some deconcentration and some loss of mobility. 


5. SUMMARY AND CONCLUSIONS 


In this paper, we have presented a technique for the derivation of the equilib- 
rium size distribution of corporations within an industry. Our basic postulate 
is that the growth pattern of firms is a size-dependent stochastic process, with 
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probabilities of transition constant in time. As discussed above, the structure 
of the industry will generally tend towards a unique equilibrium state which is 
independent of the initial configuration. Since the assumptions upon which the 
analysis is based were seen to be reasonable, the results obtained with the 
Markovian approach are of considerable interest as an indication of equilibrium 
tendencies in industry. 

As a test of this technique, we examined the steel industry in the United 
States. We found that one might expect a tendency towards deconcentration, 
as well as a growth in the size of the median firm. Since both of these trends 
have been forecast on other grounds, it would appear that the application of the 
technique presented here does not lead to absurd results, and, therefore, that 
it may prove quite useful in the study of industrial structure. 

Furthermore, one might be able to investigate other problems in industrial 
organization, such as that of plant location, etc., in a similar manner. Indeed, 
any corporate characteristic which can be quantified can, in principle, be 
analyzed in the same way, provided only that all the conditions stated in the 
text are satisfied. 





SOME ANALYSES OF INCOME-FOOD RELATIONSHIPS 


MarcueEnrite C. Burk 
U. 8. Department of Agriculture 


This article begins with consideration of the concepts of food to be 
used in the analysis of income-food relationships and reviews time- 
series and cross-section data matching these concepts. A number of 
income elasticities for food are derived and examined for comparability 
from statistical and economic points of view. Limitations of our 
knowledge of income-food relationships are demonstrated by compari- 
son of expected values and actual values for several food measures in 
1955 and by consideration of changes in the market value of food from 
1941 to 1955. 


NALYsIs of income-food relationships is no longer a relatively simple 
statistical problem, if it ever was. Development and elaboration of mean- 

ings of food, a great accumulation of varied data, and major developments in 
pertinent statistical knowledge contribute to the growing complexity of research 
in this area. Choices among alternative concepts, sets of data, and statistical 
techniques are vitally important to the reliability and applicability of findings 
on income-food relationships to market research and macro-eccnomic planning. 


The objectives of this article! are to describe and differentiate among a num- 
ber of measures of food consumption through time and at specific points in 
time, to compare simple relationships between these measures and income, 
and then to demonstrate their application in projections for 1955 from prewar 
relationships and in an analysis of changes in the market value of food from 
1941 to 1955. The formulation of econometric models and consideration of 
alternative statistical methods are outside the scope of this article [12]. 

The following topics are considered: (1) Major types of data pertaining to 
income-food relationships and what these data measure; (2) alternative con- 
cepts and time-series measures for income and for food; (3) review of (a) time- 
series and (b) cross-section measures of relationships between income and all 
food combined; (4) comparisons of such relationships and study of factors con- 
tributing to differences, including measurement of their effects; and (5) ex- 
amples of the application of such data on overall income-food relationships. 

Because of the prob!em of matching precisely the populations covered by in- 
come and food data, it is generally more satisfactory to use average rates of 
income, food expenditures, or food consumption per capita, as in concurrent 
application of time-series and cross-section data.2 However, comparisons of 





1 It follows the general lines of the paper presented to the Business and Economics Statistics Section of the 
American Statistical Association, Sept. 10, 1957 at Atlantic City, N. J. (Proe., pp. 101-17), but it incorporates the 
results of further research on the basic data and on statistical tests of the significance of differences in relationships. 

2 In this article, the term “time-series data” refers to such data as (1) series of annual data on food consumption, 
¢.g., those published by the Agricultural Marketing Service in the National Food Situation [14]; (2) series on farm 
value and retail cost of farm foods for each year, given in the Marketing and Transportation Situation [13]; (3) 
series on annual income and expenditures published by the Department of Commerce in the Survey of Current 
Business and its supplements, National Income (15]. “Cross-section data” are those from sample surveys of family 
income, expenditures, and consumption, either continuing through time or one-time only. 
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findings based on global, all U. S. time-series data, such as the disappearance 
data on food consumption, and on data derived from sample surveys of a cross- 
section of families require careful evaluation of the universes covered and of 
the effect of varying time periods. 

Although there is essentially no difference in meaning, the term “per capita” 
is used here to describe time-series data and the term “per person” is reserved 
for survey data. This usage serves as a constant reminder of differences in 
coverage. 

1. TYPES OF DATA ON INCOME-FOOD RELATIONSHIPS 


Three general types of data can be used to measure income-food relation- 
ships: (1) time series of income and value and quantity data for food; (2) data 
from cross-section surveys through time; and (3) data from cross-section sur- 
veys for single periods. 


Time-Series Data 

Time-series aggregates and per capita averages for income, food consump- 
tion, market value of all food, and food expenditures cover the whole civilian 
population. The food consumption data include all food consumed in private 
homes and away from home in institutions and eating places of all kinds. Most 
of such data are annual data so they do not reveal seasonal changes in consump- 
tion. Despite the fact that these data are relatively well known, the researcher 
must not forget that their estimation involves many statistical problems.* 

These time-series data reflect cyclical and evolutionary changes, including 
technological developments, major changes in the market structure, and 
changes in consumer preferences. For example, changes in total dollar outlays 
for food include increasing costs of marketing more and more of the food supply 
in and through congested urban centers, in part replacing home production in 
rural areas and marketing in small towns close to sources of supply. Changes in 
income-food relationships through time reflect changes in demand for food 
attributable to factors other than income, such as changes in the way we live 
and population shifts, as well as food supply and price relationships which are 
difficult to exclude because of lack of data. 


Cross-Section Surveys Through Time 


Cross-section surveys through time may be based on panels of consumers 
or repeated sampling, such as the National Food Survey of the United King- 
dom. [5] Such surveys reflect changes in demand and supply of foods in major 
segments of the food market, but have their own set of problems. Problems 
encountered in panel data include the possibility of bias due to educating the 
reporting families, loss of randomness due to “fall out,” and some difficulty in 
keeping track of changing economic and social characteristics of the sample. 
The writer has not had access to a full range of panel data and none of the 
panels covers all foods so their analysis is not included. 





* The regularly published series are described in Consumption of Food in the United States, [3] and in the 1958 
edition of the National Income Supplement, called United States Income and Output [15]. New series on market value 
of food and related measures are described in Part 2 of this article. 
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Cross-Section Surveys for Single Periods 


Data from the one-time surveys of food use by households and consumer 
units made over the last two decades provide the basis for many analyses of 
income-food relationships. Comparison of relationships derived from two or 
more single-period surveys is greatly complicated by variations in objectives, 
sample design, and coverage. Only a few notes about such survey data directly 
pertinent to this article can be made here.* 

The 1935-36 Consumer Purchases Study yielded many income and expendi- 
ture data, but the detailed food figures for segments of the population are dif- 
ficult to combine satisfactorily because the samples were not designed to pro- 
vide complete coverage. The 1942 Study of Spending and Saving in Wartime, 
the 1948 urban family food survey by the U. S. Department of Agriculture, and 
the survey of urban consumers made in spring 1951 by the U. 8. Bureau of 
Labor Statistics provided data on income and food expenditures in the pre- 
ceding year, as recalled by consumers, as well as data on food consumption and 
value for the preceding week. The 1955 Survey of Household Food Consump- 
_ tion did not obtain information on the preceding year’s food expenditures; 
_ food consumption data were collected only for the week preceding the interview 
in spring 1955.5 But the Department of Agriculture’s 1948 and 1955 survey data 
on urban food consumption and expenditures® in the preceding week can be 
compared with income data for the preceding year. Also, the Bureau of Labor 
Statistics data on urban food purchases in a week of September-October 1944, 
February 1945, and spring 1951 are on a comparable basis. For spring 1942, 
we have available data that pertain to the preceding quarter’s income and the 
preceding week’s food use. 

Analysis of findings from surveys of a week’s food consumption must take 
into account these facts: (1) During a limited period the market availability of 
goods and services is practically fixed. (2) Demand is relatively fixed or static, 
because outside influences and intra-family relationships have no time to 
change during the single week reported on by each respondent although the 
interviews may be spaced over a 3-month period. (3) The data may reveal 
irregularities in consumption patterns, market structure, and prices that are 
peculiar to the particular period. (4) Problems for some individual foods arise 
because of seasonality. (5) Only housekeeping families are included, and adjust- 
ment for meals eaten at home and away from home is made on a pro rata basis, 
21 meals at home equal to one person.” Although such adjustment is necessary, 
it may introduce bias, particularly in case of a notable change in the number 
and kind of meals eaten out. (6) Classifications and definitions are adapted to 
changing conditions and needs and incorporate accumulated experience. (7) 





4 For detailed references to national surveys see pp. 179-85 [3], and the last page of any of the reports on the 
1955 Survey of Household Food Consumption [19]. Data from the Bureau of Labor Statistics survey of income and 
expenditures in 1950 and food purchases in spring 1951 have been published by the University of Pennsylvania [20]. 

§ An article written for agricultural economists describes the 1955 food survey and some of its uses [6]. 

* Food expenditures approximated by value of purchased focd at home and expenditures for meals, snacks, 
and beverages away from home. 

1 A detailed explanation of the 21-meal calculation and description of the special problems with egg data are 
given in [18, pege 16]. 
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Sampling and reporting errors varied, reflecting improvements in sampling and 
collection methods on the one hand, and difficulties, such as obtaining coopera- 
tion of employed respondents and recall of data on more items, on the other. 


2. ALTERNATIVE CONCEPTS AND TIME-SERIES MEASURES 
FOR INCOME AND FOR FOOD 
Income 


Although choices among several concepts and measures of income and pur- 
chasing power are important in the determination of income-food relation- 
ships for particular analyses, their consideration is outside the scope of this 
article. Some of the alternatives that must be mentioned are: Total income 
including money and nonmoney items, gross and net of income and other 
taxes; past and expected income; permanent and transitory components of 
income;® current income augmented or decreased by changes in net worth, 
such as use of consumer credit or liquid assets. Among survey data, a choice is 
also possible between average family income and per person income. The total 
money value of food consumed per family varies a little more with family in- 
come (i.e., has higher income elasticity) than do per-person values compared 
with per-person disposable money income, because of the smaller average 
family sizes in the lowest income groups. 

In this article both total disposable and disposable money income, currently 
received (or approximately so), are used to simplify the statistics, because at- 
tention is focused on the food aspects of the relationships. 


Preview of Food Concepts and Measures 


Only those meanings of food in overall terms which have direct economic in- 
terest are considered here; excluded are nutritive value data, poundage ag- 
gregates, and information on single food commodities or groups of foods. Overall 
food consumption has several quantitative and value meanings. An analyst 
must select both the concept and the measure of consumption fitting that 
concept which best suit the problem at hand. Some of the choices among time- 
series and cross-section data are surveyed in the following section. The pertinent 
time-series data on food given in the Appendix refer to civilian food only, 
beginning 1941, and exclude alcoholic beverages. 


Quantity of Food 
Changes in quantities of farm foods consumed are measured by the index 





* The influences on current consumption of past and expected income and their permanent and transitory com- 
ponents have been considered by a number of economists, including Kuznets, Friedman, Modigliani, Friend, 
Kravis, Houthakker, Reid, Dunsing, and recently Nerlove [16]. It is quite likely that past and expected income 
contribute somewhat to the stability of food consumption, but so do past consumption patteras and expected 
supplies of food. As a practical matter, this writer has found the use of approximately current inco:ne to be reason- 
ably indicative of current purchasing power and economic status except in periods such as 1942-47 when prices 
were controlled, supplies of alternative consumption goods limited, and other sources of purchasing power diverted 
to food. 

A recent article by Dunsing and Reid [11] analyzed the effect of transitory income on income elasticity of ex- 
penditures, including food, for two groups of farm families in 1940-42. Farm families probably have the most 
extensive experience with transitory income and the period covered by the financial records was one of marked 
changes in farm income. As yet this writer has not been able to incorporate the hypotheses regarding permanent 
and transitory income ‘n her analyses. 
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of per capita food use of farm commodities, in effect at the farm or import level.® 
This index covers imported and domestic farm foods, including foods produced 
for home consumption, but excludes spices and fishery products because they 
are not primarily farm commodities. (Series TFQ-1 in the Appendix.'*) 

The quantity and, to some extent, the quality of food consumed are measured 
by the index of per capita consumption of all food, in effect at the retail level 
(TFQ-2)." Changing quantities of individual foods and fixed retail prices are 
used in its calculation. By measuring all foods consumed of both farm and 
nonfarm origin at the retail store level, whether commercially or home pro- 
duced, the index reflects all shifts in consumption among individual foods and 
most of the changes in marketing services other than those related to meal prep- 
aration and serving outside private homes. Even in the farm-to-retail store 
channel, however, it does not reflect increased marketing services required by 
the shift from home-produced food (included with fresh) to commercially 
marketed “fresh.” 

Changes in the per capita use of all food marketing services (TFQ-3) can be 
measured approximately by an index derived as follows: The total market 
value of all food (to be described next) less the total supplier value” can be 
divided by a price index for marketing services between the farm and retail 
levels of distribution (derived from the marketing margin of the farm market 
basket"*). Placing these aggregates on a per capita basis and comparing them 
with the base period 1947-49 provides a measure of changes in consumption or 
use of food marketing services needed for the analysis of te in the market 
value of food and food expenditures. 


Value of Food 


The value of food consumed involves prices and quantities of food as such 
and of marketing services bought with food. The several time series of value of 
food used here have been developed by the writer primarily from data of the 
Agricultural Marketing Service, supplemented by data on markups over retail 
for food bought away from home derived from Department of Commerce 
published series on food expenditures and some unpublished information." 
These value series match quite closely data derived with necessary adjustments 
from (1) the Department of Commerce series on food expenditures for the 
years 1944-56, as revised in the 1958 edition of National Income [15], and (2) 





* The quantities of individual commodities consumed by U. 8. civilians in each year, in terms of farm weight 
equivalents, are multiplied by average farm prices in 1947-49. For full description of this subindex of the master 
index of supply-utilization, see particularly [4, page 67]. The supplements for 1956 and 1957 bring the series to date. 

10 TFQ-1 represents time series, food quantity, No. 1. 

1 The index reflects shifts from cheaper to more expensive kinds of foods, and vice versa, but not changes in 
quality of individual foods consumed. For more complete description and current data, see pages 132-52 of [3] 
and its supplements for 1956 and 1957. 

13 Farm value of all farm foods plus import value of imported foods plus payments to fishermen. 

13 The AMS “market basket” series is constructed by pricing a fixed market basket of farm food commodities 
(the average quantities of farm products purchased for consumption at home by urban wage-earner and clerical- 
worker families in 1952) at the farm level, using prices received by farmers, and at the retail level, using BLS 
average retail prices, jin general. The difference between farm costs and retail costs is the marketing margin. For 
further explanation, see [17]. 

\ The writer wishes to acknowledge extensive assistance in the forms of interpretation of published series and 
access to unpublished data received from Kenneth Ogren of the AMS (as well as aid in economic analysis), and 
Lawrence Grose and Edward Basset of the Office of Business Economics, U. 8. Department of Commerce. 
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the value aggregates of the Agricultural Marketing Service index of per capita 
food consumption (TFQ-2). Prior to 1944, the series based on AMS and Com- 
merce data diverge, apparently because of differences in the levels of food pro- 
duction indicated by the Censuses of Manufactures and the estimates of the 
Agricultural Estimates Division, AMS. Because the measures of food quantity 
used herein are those of the AMS, it is desirable to use matching value data. 

A series on total market value of all food should include the value in current 
dollars of all foods of farm and nonfarm origin, both bought and home pro- 
duced, and cover food consumed at home and away from home. Such a series 
reflects changes in farm and other primary inputs and in all marketing inputs 
in the form of marketing services rendered at all stages from the farm gate or 
dock to the retail store or eating place. 

The series on market value of all food consumed at home and away from 
home (TFYV-1)" is based on the marketing bill data of the Agricultural Market- 
ing Service. [17, pp. 48 ff. }* The retail cost of farm foods sold has been adjusted 
to the concept of total market value by adding estimates of the extra costs of 
buying food in the form of meals rather than at retail stores, the farm value of 
food consumed on farms where produced and of nonfarm production for home 
use, and the retail values of imported foods and nonfarm foods. Because some 
food is sold to consumers at less than retail-store prices by farmers, processors, 
and wholesalers, an allowance for these differences between market values and 
retail values must be subtracted. Taxes and tips were not included in the food 
value series used in this article, except for the one on food expenditures. 

A measure of total dollar outlays for food, properly called ‘“‘food expendi- 
tures,” should exclude food obtained without direct expense, such as home 
produced. The series used here (TF V-2) was derived from the market value 
series (TF V-1) by subtracting these components and adding taxes and tips. 

For some purposes, a retail value series is needed. Such a measure values all 
foods consumed at the retail store level, whether purchased at retail or home 
produced or bought as meals. Therefore, the measure can reflect only part of 
the changes in marketing services sold with food, excluding changes outside the 
usual farm to retain channel. The series used here (TF V-3) includes the retail 
value (or cost) of farm foods sold by farmers, the retail value of all home- 
produced farm foods (by farm and nonfarm households) and the retail value 
of imported and nonfarm foods (i.e. fish). 

Several other sets of value data are pertinent to the study of changes in 
income-food relationships through time. Therefore, the following series are 
given in the Appendix: The farm value of food sold by U. 8. farmers (TF V-4); 
farm value of farm and nonfarm home-produced food (TFV-5) ; total supplier 
value of all foods—farm, imported, and nonfarm (TFV-6); and the total food 
marketing bill (TFV-7) derived by subtracting total supplier value (TF V-6) 
from the total market value (TFV-1). 

At present, satisfactory time-series data can be constructed only for the total 
of food consumed at home and away from home. The Commerce series on off- 
premise sales of food and beverages (a component of total food and beverage 





4% TFV-1 represents time series, food vaiue, No. 1. 
18 A description of the detailed procedures and data used in developing these new series has not been published, 
but will be included in a bulletin on analysis of changes in food consumption now in preparation. 
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expenditures) cannot be subdivided to yield a much-needed measure of away- 
from-home eating because no reliable breakdown is available for alcoholic 
beverages. Although an approximate subdivision of alcoholic beverages was 
made by the writer as one step in deriving the markups for eating places, it is 
too arbitrary for satisfactory detailed analyses. In contrast, numerous sub- 
divisions are an important contribution of the survey data. 


3A. INCOME-FOOD RELATIONSHIPS FROM TIME-SERIES DATA 


Overall income-food relationships have been derived using the principal 
measures of food just described, total disposable and disposable money income, 
and simple statistical procedures. Each of the least squares regressions sum- 
marized in Table 912 is a linear regression of logarithms of actual values or 
indexes. The general form of the equation used was: log X:'=log a+b log X2 
(+c log X3). The years included in the prewar period were determined by the 
availability of reasonably adequate data. World War II years were not in- 
cluded for obvious economic: reasons. Postwar price changes and the use of 
liquid assets to supplement current income resulted in abnormal income-food 
relationships in 1946 and 1947, so these years have been excluded. Although at 
least part of 1948 was also somewhat abnormal in these respects, the desirabil- 
ity of including as many postwar observations as possible led to its inclusion in 
the postwar period. 

For several regressions, disposable money income was used instead of total 
disposable income (1) in order to determine if its relationship to food con- 
sumption has differed from that of total disposable income (it has not) and 
(2) in order to match the income concept used in most cross-section data. Time 
was added as a third variable in a regression with the measure of food marketing 
services and income, but its regression coefficient was insignificant in magnitude 
in all three periods. 

The ¢ test was used to test differences between b values in sets of regressions, 
with the results reported at pertinent points in the following sections. 

The essential questions about the results presented in Table 912 for con- 
sideration here are: (1) Does change in income or variations in income affect 
differently the several aspects of food consumption described by these food 
measures? (2) Have the relationships between income and each of these food 
measures changed in the last 30 years? 


Quantity Measures—Findings 


1. The relationships to per capita real income of per capita food use of farm 
commodities (farm level) and per capita consumption of all food (retail level) 
did not change significantly from prewar to postwar, either in slope (i.e. income 
elasticity) or in level. (Regressions T-I A and T-I B of Table 912.) 

2. The slope of the regression line between per capita use of food marketing 
services and per capita real income did not change significantly between the 
two periods, but the level shifted upward (T-I C 1-3).!” This change in level 





17 The upward shift in level is not reflected by the a values at the origin (where disposable income is equal to 
sero), but shows up in the range of observations. A statistical test of the a’s indicated no significant difference be- 
tween the levels of the two periods, apparently because of the wide variation in the prewar observations. However, 
analyses of related economic data strongly indicate significant changes, particularly because of the drastic decline 
in home production. 
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tilted the regression line for the two periods combined and points up a basic 
problem in the analysis of economic time series, namely, identification of hitherto 
unspecified economic or social factors which chan; } the relationship of one 
economic measure to others. About two-thirds of this increase in per capita use 
of marketing services was tied in with the reduction in home production of 
food for family use, itself a result of a number of factors.'* The time series for 
food marketing services shows that much of the change occurred in 1939-41 
and 1945-47, periods of major shifts in the American economy. 

3. Statistical tests of the difference between b’s for income in the regression 
with food use at the farm level (T-I A) and that with marketing services T-I 
C 1 yield ¢ values of .9 and 1.6 for the prewar and postwar periods, respectively, 
indicating that the coefficients are not significantly different at the 5 per cent 
level. 

We may conclude that changes in income affected food use and food market- 


ing services to about the same degree in the prewar years, but somewhat dif- ~ °’ 


ferently in the postwar period. Even though the statistical test does not show 
up a significant difference between the two b’s, the relationships are not inter- 
changeable because of the great conceptual difference. The increase in the ?’s 
from prewar to postwar is in line with known dynamic changes in the economy 
associated with the rising level of real income in this country, such as increasing 
degree of urbanization, occupational shifts, decreased home food production, 
changes in ways of living. These have affected the purchases of marketing 
services far more than food as such. 


Value Measures—Findings 


1. The prewar regressions of (a) market value of all food at home and away 
from home (T-II A) and (b) all food expenditures (T-II B) with income reveal 
no difference between the two measures.’* In the postwar period, the two b’s 
were significantly different only at the 30 per cent level of probability. For the 
combined periods, the difference was significant at about the 20 per cent level. 
Here again, the “failure” of a statistical test of differences does not indicate 
interchangeability of two series which have different economic meanings. 

2. The differences between the regression coefficients for income with (a) 
food expenditures (T-II B 1) and (b) retail value of all food (T-II C) followed 
the same progression toward significance, with the postwar difference significant 
at the 10 per cent level, and the b’s for the combined periods very significantly 
different. 

3. The income elasticity of each of the three food value measures (i.e. the 
regression coefficient for income) decreased significantly from the prewar period 
to the postwar period. Gradual increases in average use of food marketing 
services (not directly related to income) apparently have resulted in the lower 
income elasticities for the postwar period. 

4. For the market value of all food (T-I A 1) and for food expenditures 
(T-I B 1), the income elasticity derived for the combination of prewar and 





18 For further analysis, see “Home Production: Part II” in the July 1958 issue of [14] and [1]. 

1® The writer has some reservations concerning the reliability of the estimates of home production of food in 
that period (believing them to be underestimated) because of the lack of basic data to serve as benchmarks. How- 
ever, they are the “best” estimates that ean be made. 
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postwar years was significantly higher than for prewar. For retail value, the 
difference was not significant below the 15 per cent level of probability. These 
changes are tied to the change in level of per capita use of marketing services. 

5. Another way of measuring changes in income-food value relationships is 
to compare the proportions of income per capita allocated to food in selected 
years. The following data, derived using total disposable income per capita 
and market value of all food, show that the proportion in 1955, 1956, and 1957, 
has fallen below pre World War II relationships: 











Per Cent Per Cent 





1929 26 1949 26 
1934 30 1954 23 
1939 24 1955 22 
1941 23 1956 22 
1944 22 1957 (prel.) 22 








3B. INCOME-FOOD RELATIONSHIPS FROM CROSS-SECTION DATA 


Several of the meanings of food elaborated and matched with time-series 
measures in Part 2 can also be matched with measures developed from cross- 
section data. Income figures and value data for all food combined are usually 
available directly from the reports on the household food surveys, but eco- 
nomic measures of the quantity of all food consumed have not been available.?° 

To measure variations in over-all quantity of food consumed by major seg- 
ments of the U. 8. population, three structural indexes have been developed 
recently by the Consumption Section of the Agricultural Marketing Service. 
Two indexes match the definitions of the time-series index of per capita food 
use of farm commodities (TFQ-1). The data for corisumption at home from the 
1955 Survey of Household Food Consumption were converted to their farm 
commodity equivalents and valued at 1947-49 farm prices. One of these two 
structural indexes covers consumption at home from all sources, the other only 
purchased foods. The third index measures variations in consumption at home 
from all sources in terms of average retail value at 1947-49 average prices. This 
index matches the time-series index of per capita food consumption (TFQ-2). 
Each of these three indexes, as yet unpublished, relates the per-person food 
consumption averages of households in the 1955 survey in each income class of 
each urbanization category to the U. S. average (equal to 100). Income-food 
relationships derived using these measures and disposable money income are 
given in Table 915 (CS-I A, B, and C). 


Using Quantity 

The relationships to income of the two structural measures of all food con- 
sumed at home in spring 1955, one at the farm level (CS-I A), the other at the 
retail level (CS-I C), were practically identical for each of the four urbaniza- 
tion categories (all, urban, rural nonfarm, and farm). However, they varied 





2% The total poundage of food consumed is not an economic measure because it is affected primarily by water 
and cellulose content and does not reflect costs of production and distribution or consumer preferences. 
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TABLE 915 


SUMMARY OF LEAST-SQUARES REGRESSIONS WITH CROSS-SECTION 
DATA ON SELECTED MEASURES OF FOOD AND INCOME! 











Income (X2) Regression equation 
Food measure (per person)/ 





(X1) ’ Constant} Coeffi- 
|Measure per person terme cient? 
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. Food expenditures at 1950 
home and away 


-37 (.04) 


. Total market value of | Spr. 1942 

food at home and away | Spr. 1955 
Spr. 1942 
Spr. 1955 
Spr. 1942 
Spr. 1955 
Spr. 1942 
Spr. 1955 


-30 (.03) 
-25 (.08) 
+31 (.03) 
-27 (.04) 
-32 (.03) 
-30 (.02) 
-19 (.02) 
-11 (.02) 
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. Food expenditures at | Spr. 1942 

home and away Spr. 1955 
Spr. 1942 
Spr. 1948 
Spr. 1955 
Spr. 1942 
Spr. 1955 
Spr. 1942 
Spr. 1955 


Ist Q 1942 ° .52 (.02) 
1954 ‘ .37 (.02) 
Ist Q 1942 . -38 (.02) 
1947 d -30 (.03) 
1954 A - 29 (.03) 
Ist Q 1942 . - 45 (.04) 
1954 - 40 (.02) 
Ist Q 1942 .427| .31(.06) 
Disposable money 1954 . . 22 (.03) 
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1 Linear regression in logarithms. 

2 Households of two or more persons except where indicated. 
3 Standard errors given in parentheses. 

4 Including singles. 

5 Computed in 1935-39 dollars. 

« Computed in current dollars. 

7 Computed on same dollar basis. 
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from one category to another. The relationship to income of the third index, 
for purchased foods measured at the farm level (CS-I B), differed from those of 
the indexes for all sources. The use of purchased farm commodities rose signif- 
icantly more with income for successively higher income classes for (a) all 
urbanizations combined, (b) rural nonfarm, and (c) farm households than the 
other two measures. The urban coefficients for income for the three measures 
were not significantly different because home production is only a minor source 
of food for urban households. 

Tests of the relationships between income and the use of farm foods from all 
sources for the three separate urbanization categories (CS-I A 2, 3, and 4) in- 
dicate that the income elasticity for rural nonfarm households was significantly 
higher than the urban and that for farm households significantly lower. In the 
case of the purchased food index (CS-I B 2, 3, and 4), the rural nonfarm income 
elasticity was even higher. Differences in levels of food purchased by households 
in the three urbanizations tilted the regression line for all urbanizations com- 
bined. (This phenomenon is comparable to that observed when the prewar and 
postwar sets of time-series data are combined.) The urban and rural nonfarm 
income elasticities (as measured by the regression coefficients) differed less 
significantly for the food consumption index measured at the retail level (CS-I 
C 2 and 3) than for the index of food use from all sources, measured at the farm 
level (CS-I A 2 and 3). 

The consistently higher degree of elasticity found for the rural nonfarm 
category probably denotes the marked difference between the food patterns 
of lower income families in this group, which resemble the farm patterns, and 
those of high income families, resembling food patterns of high income urban 
families. 


Using Annual Value Data 


In an earlier paper [2] the writer reported a number of income elasticities 
for food measures derived from annual recall data. The income elasticities for 
the market value of all food at home and away consumed by all U. S. house- 
holds were equal in 1935-36 and 1941 (CS-II A 1 and 2).*" The income elasticities 
of expenditures for food and beverages at home and away from home by urban 
households were significantly lower in 1944, 1947, and 1950 than the .58 for 
1941 (CS-II B). A ¢ test showed the .40 for 1950 to be significantly different 
from the .31 for 1947 at the 8 per cent level of probability. In the opinion of this 
writer, the higher b for 1950 marks some recovery from the abnormal income- 
food relationships of wartime and immediate postwar years. 


Using Spring Value Data 


The surveys of household food consumption made by the Department of 
Agriculture in spring 1942 (with BLS), 1948 (urban only), and 1955, supply 
data for more detailed analysis of changes from one period to another of cross- 
section relationships between income and value of food consumed. The data 
used for this analysis were income class averages for incomc and for (a) total 


™ Improperly identified as food “expenditures.” Footnote 2, p. 285 [2]. 
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market value of food consumed at home and away from home in a week of 
spring and (b) corresponding expenditures for food at home and away from 
home. The following adjustments were made in the data given in survey reports: 
(1) All data were put on a per person basis, using the count of family members. 
(2) The income data were converted to 1954 dollars using the Consumer Price 
Index. (3) Food values were converted to spring 1955 dollars, using the BLS 
retail food price index. 

The Engel curves for market value of all food at home and away for all U. 8. 
households in spring 1942 and 1955 are given in Fig. 917. The difference in the 


MARKET VALUE OF ALL FOOD CONSUMED PER PERSON 


in all U. S. families in a week, spring 1942 and 1955# 
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slopes of the two linear regressions (Table 915 CS-II D 1 and 2) is not signifi- 
cant below the 35 per cent level of probability, according to the ¢ test. How- 
ever, supporting data reveal some important economic changes which add 
credibility to the significance of this shift. The urbanization “mix” of each 
real income class changed from 1942 to 1955. The relatively greater rise in the 
level of the all-U. S. curve in the lower range of income reflects the significant 
change in the slope of the farm regression line (CS-II D 7 and 8) and the gen- 
erally higher level of the rural nonfarm line (CS-II D 5 and 6). An analysis of 
rural changes in food consumption, developed in another article [1], found: 
(1) A decrease in the number of low-income cotton farmers materially changed 
the composition of the low-income classes and thus raised average food con- 
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sumption of low-income farmers. (2) Use of freezing facilities greatly increased 
the home production and consumption of beef by farm households. (3) 
Changes in places of occupation and ways of living of rural nonfarm families 
led to changes in their food patterns. 

Fig. 918 with the 1955 data for U. S. families grouped by urbanization and 
data in Table 915 (CS-II D) show the total market value of all food per person 
consumed in farm families to be much less elastic with income than that for 
nonfarm families. This difference probably results from (1) much greater use of 
home-produced foods by farm families and (2) the effect of other nonmoney 
income, and (3) the fact that a single year’s income is a much less satisfactory 


MARKET VALUE OF ALL FOOD CONSUMED PER PERSON 


in U. S. families grouped by urbanization, in a week spring 1955* 
Dollars 
20 


























-- aT 


a Au * Fae Rural nontarm 


———— 





























200 500 1,000 2,000 


Disposable money incame per person in 1954 


oN oe Fe ee ee - 
Data fram 1955 Survey of Household Food Consumption 


Fie. 918 


measure of farm families’ purchasing power than for urban families because of 
relatively greater year to year variability in farm income. As the proportion of 
farm families in the total U. 8. population declines, income elasticities for the 
whole population may rise slightly. It is likely that the preponderance of farm 
families in the income groups below $2,000 in 1955 flattens out that part of the 
Engel curve for all households.” 

The income elasticity of total food expenditures at home and away from home 





* Comparable charts of data for the four regions for all families showed the same flattening effect of farm 
families’ food patterns on the Engel curve for all families in the South. (Almost half of the country’s farm population 
lives in this region.) This curve was at a lower level throughout the income range than those for other regions. The 
relationship of the market value of all food to income for the Northeast was at a definitely higher level than for 
other regions through the $3-8,000 income range. The same inter-regional relationships were evident for urban 
famili 
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(CS-II E 1 and 2) decreased significantly from spring 1942 to spring 1955.¥ 
Recall that this food measure excludes home-produced food and food received 
as gifts and payment in kind, which are relatively more important for families 
with lower money income, notably rural families. Therefore, it is not surprising 
to find (1) that the income elasticity of food expenditures was significantly 
greater than that for market value of all food both in 1942 and 1955 and (2) 
that there was a marked decrease in the degree of this difference between the 
two years. The decreased income elasticity of food expenditures reflects the 
change in the urbanization “mix’’ and the change in the levels of the Engel 
curves for the component urbanization categories, tied in with the drastic cut- 
back in home production. Also, the relatively easier food supply and price 
situation in spring 1955, compared with spring 1942, may have affected income 
elasticity slightly. 


4. COMPARISON OF INCOME-FOOD RELATIONSHIPS DERIVED FROM 
THE TWO TYPES OF DATA 


No tests of differences between income-food relationships derived from time- 
series data and those derived from cross-section data have been made. Such 
tests would have no economic or statistical meaning because different popula- 
tions are involved. Time series reflect dynamic changes in the economy and in 
our society whereas individual cross-section surveys provide static pictures. 
However, comparative study of the two sets of relationships can yield useful 
insights. 

A satisfactory structural index has not been constructed from the spring 
1942 food consumption data so there is no “prewar” comparison for the spring 
1955 income elasticities for use of farm foods, farm level—all sources, and con- 
sumption at the retail level, both of which are .12 (CS-I A 1 and I C 1), The 
time-series measures matching these in definition yielded elasticities of .17 
and .20, respectively, for 1948-57 (T-I A and B). The higher elasticity for time 
series is a generall: observed phenomenon and will be considered more fully 
below. 

In comparing changes from prewar to postwar in the income elasticities for 
the two food value measures used here, based on time-series and spring cross- 
section data, it is necessary at the outset to note that spring 1942 is a very 
“late” date to be describing prewar income-food relationships, but those data 
are the only comparable ones available. The income elasticities pertinent to this 
comparison, taken from Tables 912 and 915, are repeated here: 

a. For market value of all food, using disposable money income 
1929-41 1948-57 
(1) Time series (T-II A 2) .68 .38 
Spring 1942 Spring 1956 
(2) Cross section (CS-II D 1 and 2) .30 25 


b. For food expenditures at home and away from home, using disposable money income 
1929-41 1948-57 
(1) Time series (T-II B 2) .67 -48 
Spring 1942 Spring 1956 
(2) Cross section (CS-II E 1 and 2) .52 .37 





2 A comparable change in income elasticities was noted for meat consumption by urban families between 
spring 1942 and spring 1948 and appraised in the final report on the 1948 urban food survey [8, pp. 47-50}. 
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Tests of significance indicated that the time-series income elasticities for 
market value were significantly different at the 2 per cent level of probability 
whereas those from cross-section survey data were not significantly different 
until the 35 per cent level. For food expenditures the income elasticities from 
time-series data were significantly different at the 8 per cent level whereas the 
cross-section elasticities differed at the 0.1 per cent level. 


Reasons for Differences 


Next, the vital question in the analysis of income-food relationships must be 
considered: Why are the regression coefficients of food in relation to income 
derived from the two types of data so consistently different? 

There are a number of reasons. First, statistical difficulties in both types of 
data and differences in coverage contribute to an unknown degree. The time- 
series data cover the entire civilian population, but the surveys encounter 
problems in sampling and reporting for both the highest and lowest income 
groups of families, and usually do not cover any of the nonhousekeeping popu- 
lation. 

Even more important is the fact that changes in the distribution of families by 
income group affect time-series relationships but not the cross-section relation- 
ships or Engel curves. The latter, by definition, cover only one point in time. 
The effect of such changes in income distribution on per person averages for 
food consumed by all U. 8. housekeeping households (which are broadly com- 
parable with U. S. per capita data or time series) can be demonstrated with 
manipulations of the survey data for the springs of 1942 and 1955. Given (1) 
the spring 1942 average market value of all food at home and away per person 
for each real income group, and (2) the 1942 distribution of the housekeeping 
population by urbanization,™ the change in income-size distribution of the 
population within each urbanization from spring 1942 to 1954 accounted for 
about 45 per cent of the increase in the U. 8. average market value of food per 
person from 1942 to 1955. 

For food expenditures at home and away from home, the shift from 1942 to 
1955 in the distribution in family members by family income (in constant 
dollars) accounted for 40 per cent of the increase in the U. S. average from 
1942 to 1955. 

Changes in urbanization from 1942 to 1955 had relatively little effect on the 
market value of all food, apart from those related to concurrent changes in 
income. But they accounted for 15 per cent of the increase in food expenditures. 
Again, the population shift is a dynamic factor not measurable in a single 
survey, which resembles a still picture. 

Fig. 917 for all U. S. housekeeping households shows the net effect of changes 
in the Engel curves of the three urbanizations between 1942 and 1955, i.e. 
changes in (a) the patterns of relationship between income and market value of 





™ The percentage distribution of family members by family income for each urbanisation category for 1955, 
according to 1954 income in terms of first quarter 1942 dollars, were derived by shifting the cumulative curves to 
the left according to the percentage increase in the CPI. These curves are cumulative distributions of the population 
plotted against income. 

This calculation of the effect of the change in income-sise distribution consisted of reweighting the 1942 av- 
erages of market value of food for each income group within each urbanization by the 1955 percentages of family 
members in each group, with the class limits adjusted to 1942 dollar basis. The averages for each urbanization 
were then com ined using the 1942 distribution of the housekeeping population by urbanization. 
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all food and in (b) their levels. The net effect is measurable as follows: Given 
(1) the first quarter 1942 distributions of family members by income class 
within the three urbanizations and (2) the 1942 distribution of the housekeeping 
popuiation by urbanization to combine the computed averages, the changes 
from 1942 real income class averages for market value of all food to those for 
1955 accounted for 55 per cent of the change in U. 8. average market value of 
all food at home and away. On the food expenditure basis, the change in income- 
food relationships or Engel curves accounted for only 45 per cent of the change 
in U. 8. average expenditure per person. 

These changes in level and income elasticity for food value measures can be 
traced to (1) several types of changes in the food supply situation from 1942 to 
1955 that have no counterpart at one point in time, and (2) gradual changes in 
demand for food and for food marketing services which are basically unlike 
differences between such demands of high-income families and those of low- 
income families at any one time. Only a listing of the types of such changes can 
be given here.” 

Pertinent changes in the food supply situation include: (a) Changes in the 
relationships of food to nonfood prices and to the general price level resulting 
from changes in relative supplies and costs. (b) Changes in relative positions 
among foods; for example, the sugar situations in the springs of 1942 and 1955, 
(c) Changes in the quantity and quality of individual foods available in retail 
food stores, such as supplies available for urban households in the springs of 
1942 and 1955. (d) Changes in food expenditures resulting from changes in the 
supply situation and quality of an individual food, like margarine. (e) Use of 
new foods and improved forms of processing. (f) Cost of additional marketing 
services just to move supplies from producing areas to metropolitan markets. 
These costs include those involved in the shift from home production to com- 
mercially produced supplies. 

Among the changes in the food demand situation, as from 1942 to 1955, which 
have no counterpart at one point in time, are these: (a) The shift of the popula- 
tion by urbanization. (b) Changing food needs arising from changes in the ac- 
tivity and age composition of the population as well as from increasing knowl- 
edge of nutrition. (c) Changing food tastes and emphasis on food buying in 
relation to other goods and services. (d) Increased use of prepared and partially 
prepared foods arising from changes in the way we live. (e) Short-term varia- 
tions in purchasing power not due to current disposable income. 


5. APPLICATION OF DATA ON OVERALL INCOME-FOOD RELATIONSHIPS 


Findings on income-food relationships from various types of data are directly 
useful in several areas of economic research, as in analysis of the demand for 
farm resources in the form of farm food commodities. [9] Subdivision of the 
data on food into the use of farm resources in the form of farm commodities as 
they leave the farm and the use of marketing resources to process and move 
foods to ultimate consumers provides a basis for analysis of demand for market- 
ing resources in total. As more refined data are developed, it should be possible 
to analyze the demand for major types of marketing resources in the form of 





*% For an analysis of changes in the rural sector, see [1]. 
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marketing services. Such analysis would be useful for planning adjustments in 
allocation of marketing resources. 


Comparison of Estimates for 1955 with Actual Data 


The most obvious application of income-food relationships such as those de- 
veloped in this article is in making economic projections. The use of historical 
relationships to make projections for a recent year is a painless way of acquiring 
know-how in use of available tools. Some estimates or “expected” values for 
several food measures in 1955, based on prewar relationships, are compared with 
actual 1955 data in Table 923. 

The reader will observe the tendency to overestimate the two food quantity 
measures, probably because the relationships for the two measures, even in 
logarithms, are slightly curvilinear. The estimate for use of food marketing 
services was so far below the actual 1955 figure because prewar relationships did 
not encompass the degree of change which occurred as the country’s economy 
expanded under war stimuli and postwar investment and defense outlays. The 
same phenomenon caused the serious underestimate of the value measures 
based on prewar time-series data. 

The estimates of value measures for 1955 based on the 1942 spring survey 
data have the advantages of the wider range of variation and of being closer 
in time to postwar. The estimates made with spring 1942 regressions reflect 
the upward bias of a simple linear regression when a curvilinear one would be a 
better fit, and thus accidentally compensate for the change in level due to 
increased marketing services. The reweighting of 1942 income class averages 
with the 1955 income-size distribution provided the measures of relative im- 
portance of changes in income as a factor in increased food values used earlier 
in this article. 

Comparison of the results for 1955 using measures developed from time-series 
and cross-section data must take into account these facts, which are cited again 
although they have been noted in the development of the data: (1) Spring 1942 
was closer to 1955 than was the mid point or average for 1929-41. (2) There 
are very important conceptual differences between the two types of data. (3) 
The reweighting procedure allows (a) for curvilinearity of income-food rela- 
tionships in that part of the real income range in 1955 which had been ade- 
quately sampled in 1942 and (b) for shifts in relative income-size distribution 
whereas the time-series regressions allow only the average rate of shift in the 
period for which they are computed. 


Analysis of Changes in Market Value, 1941 to 1955 


Data on income-food relationships are combined in the second part of Table 
924 with the results of the analyses in Part I of the table to measure the impacts 
of major economic and social factors on the market value of all food between 
1941 and 1955. Most of the basic data used in Table 924 are from series in the 
Appendix of this article. Data for the measurements of change are either from 
published AMS reports or have been described in this article. Computations 
are quite straight-forward—no commentary is needed. Further research on the 
analysis of changes in food consumption is now in process. This will add re- 
finements to this type of computation. 
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TABLE 924 


REVISED ANALYSIS OF THE CHANGE IN THE TOTAL MARKET 
VALUE OF ALL FOOD BETWEEN 1941 AND 1955 












































Increase 
Item 1941 Posing 1955 
Basis for estimate Amount| of 
increase 
Bil. Pet. 
Total market value of all food for U. 8. civilians, 
excluding taxes and tips, current dollars 21.2 38.7 100.0 59.9 
L sis by componen' 
A "Payment for bse productive sores 
1. — Ta: of farm foods 
1. Farmers’ civilian foods 7.1 18.3 
2. Home produced, farm and nonfarm 1.7 2.3 
For increased quantity and quality sg 30 per cent increase in total civilian 
‘ood use, domestically produced? 2.6 
For price rise to get more food and gen- 
eral rise in price level Residual* 9.2 23.8 
Total to domestic producers 8.8 11.8 5 20.6 
2. To importers and fishermen 
For increased quantity and quality 20 per cent increase in total civilian 
. tae of impor farm foods! and 
12 fan eg ae 
For price rise to get more food and 3 ‘ 
Aah AT Residual 2.5 6.5 
Total to importers and fishermen 3.6 
Total for productive resources 9.7 14.5 37.5 
B. its for marketing services 
‘or more services pot harm Somer Aw 
cen 
eT eT oe ea . ef 
Scompiedsl Geanels 14 per cent more commercial food* 3.6 9.3 
For additional services 23 per cent less 14 per cent 2.3 5.9 
Total 5.9 15.2 
For price rise to get more services and 
eae Bo peice vd 
same services 12.1 31.3 
On additional services 6.2 16.0 
Total 18.3 47.3 
Total for marketing services 11.5 24.2 62.5 35.7 
II. Analysis by economic and social factor 38.7 100.0 
A. Price Derived from part I 30.0 77.5 
B. Population increase 23 per cent applied to 1941 total 4.9 12.7 
1. More food 2.2 5.7 
2. More marketing services 2.7 7.0 
C. Changes in income From survey data, 45 cent of 
change per person’ ¥ 1.7 4.4 
1. For more food Increase in ita of farm 
foods a ne oe 5 1.3 
2. For more marketing services 1.2 3.1 
BD. Desaeasahp bemepeatnation ast dnstobe- ay cease i food movi through 
come change,® all for more marketing canals do 
services jue to 
=o 2.1 5.4 
2 pute fos 1061 and 1908 trom cxsian in the Agpendin. 
_ Appronitntly eval rei far 
Tabi Be dy Ly ae eet te ee ed A ean et. {17}. 
lor 1 
nivel by om bye wy irene inthe marketing bil in current dolar, ven nthe Appendix, bythe index ofthe marketing 
margin of AM market 
* Computed by by ousting ike fades ot of per capita food use from all sources for the proportion home produced. 
* Prewar relationships between Income and uction indicated some increase in home production with higher income. 
Bee “Heese Production Past Il" in the July 1980 twose Fihay om 
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This analysis of changes from 1941 to 1955 poses an important question for 
the economic analyst and would-be forecaster. Even given the data and rela- 
tionships for prewar years described in this article, how far could these changes 
in market value of food have been foreseen or estimated even as late as 1944? 

The essential elements in changes in U. S. total market value of all food can 
be only briefly considered here. 

(1) Population—By 1944 the baby boom was well under way, but population 
projections were consistently below actuals. [10] 

(2) Income—Actual 1955 real income per capita was rauch beyond expecta- 
tions even by 1944, when wartime experience with economic expansion had al- 
ready changed our ideas of potentials of both agricultural and industrial pro- 
ductivity. 

(3) Quantity of food consumed—Estimates made in 1944 probably would 
have been a little high.” 

(4) Marketing services—Increases considerably exceeded expectations and 
proved to be the key to postwar increases in market value and food expendi- 
tures. The decline in home food production was general knowledge, but its 
impact on the demand for marketing services has been largely ignored. 

(5) Price rise—By 1944 the increase in farm prices from 1941 to 1955 was 
fairly obvious, but the relatively greater increase in prices at retail was not ex- 
pected. 

SUMMARY OF MAJOR FINDINGS AND PROCEDURES 


Major jindings of this article concerning the relationships between income 
and food consumption may be summarized as follows: 

(1) The relationships to real income of quantity measures of food consump- 
tion have not changed in the last 20 years. 

(2) The level of use of food marketing services has risen significantly with 
much of the change occurring in 1939-41 and 1945-47. 

(3) This change in level of food marketing services resulted in higher post- 
war levels of market value of all food consumed and in food expenditures in 
relation to income and continuation of the changes has contributed to decreases 
in the income elasticities for the food value measures. 

(4) Analysis of survey data shows that major increases in the demand for 
commercially produced food and for food marketing services in relation to 
income have come primarily among farm and rural nonfarm households and 
lower income urban households. 

(5) Increases in average consumption of food from all sources resulted from 
higher incomes whereas the use of food marketing services has exceeded expec- 
tations based on income-marketing service relationships in prewar years. 

Three procedures were essential to the development of these findings: 

(1) Use of separate regressions to derive time-series relationships between 
income and several measures of food for prewar and postwar years. 

(2) Use of survey data for analysis of changes in market value and in con- 
sumption in terms of changes to be expected on basis of changes in distribution 
of the population among income groups and changes in other factors as re- 
flected by changes in income-food relationships. 





% Comparison of projections and actuals encounters many complications such as those considered by James P. 
Cavin in [7]. 
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(3) Analysis of components of changes in market value of food, using in- 
dexes for changes in quantities, adjusted for population change, and meas- 
uring price changes as residuals. 
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A CROSS-SECTION ANALYSIS OF NON-BUSINESS AIR TRAVEL* 


Joun B. Lansing anp Dwicut M. Bioop 
University of Michigax 


For any individual adult what linear combination of variables yields 
the best estimate of the probability that he will take at least one non- 
business air trip in a year? Given that an individual will take at least 
one non-business air trip, what variables yield the best estimate of the 
number of such trips which he will take? This paper is a report of an 
attempt to answer these questions through multiple regressions based 
on data from a survey covering a cross-section of the adult population 
of the United States. Most of the analysis is devoted to the question 
of what factors determine the probability that an individual will take 
at least one air trip. 


1. THE PROBLEM AND THE METHOD OF INVESTIGATION 


REVI0Us studies of the demand for air travel have relied upon aggregative 

statistics for given communities or for the economy as a whole.! The em- 
pirical basis for this study is a cross-section of the adult population of the 
United States. The data were collected in 1955 in a National Travel Market 
Survey conducted by the Survey Research Center of the University of Michigan 
for the Port of New York Authority and the New York Central System. Meth- 
ods used in this survey and findings based on frequency tabulations have been 
reported in detail elsewhere.* Briefly, the sample included about 4000 inter- 
views of which half were taken in May and June 1955 and half in October and 
early November of that year. Interviews were taken in 66 primary sampling 
units, including the 12 largest cities in the country and 54 other counties 
(or groups of counties). Within the primary sampling units smaller geographical 
areas were selected, with the final stage being the sx ‘ection of individual dwell- 
ing units. One individual in each family was interviewed. Questions were asked 
about trips by all adults in the family by air, rail, bus, and auto. 

The convention was adopted that a “trip” is a round trip to a point 100 
miles or more away from home. A “business” trip is a trip “in connection with 
your work.” “Non-business” trips are trips for any other purpose, including 
primarily pleasure trips and vacation trips but also trips on personal affairs, 
for example, trips to and from school, to visit sick relatives, to obtain medical 
attention, or to move to a new home. In 1955 seven per cent of the adult popu- 
lation took one or more non-business air trips. Of these, about three out of four 
took only one such trip. 





* The analysis reported here has been carried out as a project of the Research Seminar in Quantitative Eco- 
nomics of the Department of Economics of the University of Michigan. The data were made available to the 
Research Seminar by the Survey Research Center, a unit of the Institute for Social Research of the University. 
The authors are indebted to the Statistical Research Laboratory of the University of Michigan which made available 
to them its International Business Machinw Type 650 Magnetic Drum Data-Processing Machine. An oral report 
of the findings was made at the summer ting of the E tric Society in September 1957. 

1 See, for example, U. 8. Department of Ci Commerce, Civil Aeronautics Administration, Civil Air Traffic Pore- 
casts, 1960-1965, August 1955. See, also, Lloyd B. Aschenbeck, “Passenger Air-Line E ics,” Aer ti 
Engineering Review, Vol. 15, No. 12, December 1956, pp. 39-43. 

* John B. Lansing and Ernest Lilienstein, The Travel Market 1955, Institute for Social Research, Ann Arbor, 
Michigan, 1957. A projection of air travel based on this survey has been prepared by the staff of the Port of New 
York Authority. 
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In the analysis reported here the principal technique used is multiple re- ' 
gression. Extensive use is made of a standard program developed for the Inter- 
national Business Machines Company’s Model 650 Computer, which permits 
the use of 14 independent variables.’ It computes regression coefficients and 
their standard errors, tests each coefficient at the five per cent level of sig- 
nificance, drops out the variables for which the coefficients are not significant, 
and recomputes the coefficients for the remaining variables. By successive 
computations the omitted variables are rechecked for. significance one at a time 
in combination with the variables found significant. After the first iteration 
the tests are no longer being made at the five per cent level in the ordinary 
sense. Furthermore, in the later stages of this research, when the basic equation 
is reformulated, hypotheses based on the first stages were “tested” against the 
original data. Thus, the investigation should be regarded as an attempt to de- 
velop hypotheses which are not contradicted by prior knowledge and are 
consistent with the data. 

Two further reservations must be stated with regard to the statistical tech- 
niques used though a detailed discussion of the problems is beyond the scope 
of this paper. First, multiple regression is not ideally suited to problems in- 
volving a dichotomous dependent variable. For example, it may lead to an 
estimating equation which will yield estimated probabilities which are less than 
zero for some individuals and greater than one for others. The main interest 
in this study, however, is in the question of which variables have coefficients 
significantly different from zero and for this purpose regression seems an ap- 
propriate technique despite its obvious shortcomings in other respects. The 
regression coefficients in a linear probability function such as that used here 
are identical to those which could have been obtained from a linear dis- 
criminant function fitted to the same data. (To obtain identical coefficients: 
select the 0.5 level of probability as a discriminating index, add the constant 
term to both sides of the regression equation, and divide through by the co- 
efficient of the first independent variable.**) Thus, the technique used here may 
be regarded as equivalent to discriminant analysis for purposes of the present 
investigation. 

Recently the suggestion has been made by Tobin and others associated with 
the Cowles Foundation that probit analysis or a combination of probit analysis 
and multiple regression may be appropriate for problems such as that analyzed 
here.* The methods proposed involve solutions by iterative processes which 
are impracticable for multivariate analysis without the use of special programs 
for high speed computers. Such programs were not available at the time the 
present research was undertaken. The authors preferred the flexibility associ- 
ated with use of a standardized technique for the present investigation, in 
which the emphasis has been on the exploration of the data rather than on a 





* This program was developed by James G. Wendel, now Associate Professor of Mathematics, University of 
Michigan. 

4 Daniel B. Suits, “Linear Probability Functions and Discrimination,” Discussion Paper of the Research Sem- 
inar in Quantitative Economics, University of Michigan, Oct., 1957. 

5 Dwight M. Blood and C. B. Baker, “Some Problems of Linear Discrimination,” Journal of Farm Economics, 
August 1958. 

*J. Tobin, “Estimation of Relationships for Limited Dependent Variables,” Econometrica, Vol. 26, No. 1, 
January 1958, pp. 24-36. ‘ 
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search for optimal statistical methods. As yet there is not complete consensus 
among statisticians as to the relative merits of different techniques for dealing 
with problems such as that reported here. 

The second reservation arises out of the complexity of the sample design. 
The estimates of sampling errors of regression coefficients reported here assume 
that the individuals to be interviewed were selected independently. That is, 
they assume a simple random sample. In fact, the sample design was highly 
complex. The most important complication is that interviews were clustered in 
66 primary sampling units, and, within primary sampling units, were further 
clustered in small geographical areas such as city blocks. The exact nature of 
the clustering depended on such factors as the degree of urbanization of the 
area, with different techniques used for large cities, small towns, and the 
countryside.’:* As a consequence of clustering, the number of independent ob- 
servations is not the same as the number of interviews.’ Unfortunately very 
few estimates have yet been made of the sampling errors of multiple regression 
coefficients from complex samples. Hence, although it is almost certain that 
the sampling errors shown here are underestimates of the true sampling errors, 
the amount of the underestimate is unknown. 

Estimates of the sampling errors of proportions have been made for the 
data under discussion which do allow for the departures from simple random 
sampling in the design of the survey. These estimates yield sampling errors 
roughly twenty per cent to fifty per cent larger than those for simple random 
samples with the same number of interviews, with the smaller increment apply- 
ing to small sub-samples of the total sample of interviews, and the lerger, to 
proportions based on the full sample. In the absence of further information it 
seems reasonable to assume as a first approximation that the sampling errors 
calculated in this investigation understate the true errors by amounts of the 
same order of magnitude. As a rule of thumb, in tests of the null hypothesis 
for regression coefficients the authors have accepted as significant at the 
ninety-five per cent level of confidence a regression coefficient equal to or greater 
than three times its estimated standard error. They have regarded as marginal 
coefficients with values between two and three standard errors, and have 
rejected as not significant at the ninety-five per cent level regression co- 
efficients with values less than two times their standard errors as estimated 
with the usual formula for simple random samples. (Using three sigmas instead 
of two is, of course, equivalent to increasing the sampling error by fifty per 
cent.) 

Regression analysis requires that the variables used be scaled. As will be 
apparent from a study of Tables 931 and 940, some of the scales used were defined 
by the use of judgment. To the extent that the judgment used was poor, the 
effect is to reduce the chances of obtaining significant results. The use of 





’ For a more complete discussion of sampling in the present survey, see “Sampling Methods and Sampling 
Errors,” by C. Edwin Dean, which appears as Appendix A in Lansing and Lilienstein, op. cit., pp. 83-93. 

® See Leslie Kish, “Confidence Intervals for Clustered Samples,” American Sociological Review, Vol. 22, No. 2, 
April 1957. 

® Although in the original survey questions were asked about travel by all adults in the family, in the present 
analysis only data for the respondents themselves are used. This procedure avoids further clustering of observations 
within families. It also results in omission from the analysis of adults other than the head of a family or his wife. 
The individuals omitted are primarily young single adults aged 18 or over living with their parents. 








NON-BUSINESS AIR TRAVEL 


TABLE 931 


DEFINITION OF VARIABLES FOR REGRESSION ANALYSIS OF 
WHETHER AN INDIVIDUAL TOOK A NON-BUSINESS AIR 
TRIP IN THE LAST TWELVE MONTHS! 


Equation (1) 








Variable 


Symbol 


Range of Values 





Non-business air travel 

Experience in flying before 
this year 

Rail travel 


Vacation 


Income change 


Air mindedness 


Fear of air sickness 


Cheapness vs. expensive- 
ness of air travel 


Income 





Y 


xX, 


Xs 


Xs 





0 if no non-business air trip in last 12 months 
1 if took such a trip 


0 if never had taken an air trip 
1 if had taken an air trip 


0 if no rail trip in the last 12 months 
1 if took such a rail trip 


0 if no paid vacation 
1 if had a paid vacation of a week or more 


1 if income much larger 

2 if income somewhat larger 

3 if income same or if change not ascertained 
4 if income somewhat smaller 

5 if income much smaller 


0 if no mention of air mindedness as a reason why 
people might fly 
1 if mentions air mindedness 


0 if no mention of fear of air sickness 
1 if mentions fear of air sickness as a reason why 
people might not fly 


0 if mentions expensiveness as a disadvantage of 
flying 

1 if mentions neither expensiveness nor cheap- 
ness; or if mentions both 

2 if mentions cheapness of flying 


Amount of family income 
Under $1000 
$1000-1999 
$2000-2999 
$3000-3999 
$4000—4999 
$5000-5999 
$6000-7499 
$7500-9999 
$10 ,000~14 ,999 
$15 ,000-19 ,999 
$20 ,000 and over 





1 Based on interviews taken in October 1955. 


(Continued on next page) 
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TABLE 931 (continued) 





Variable Symbol Range of Values 





Square of income Xs Amount of family income 
-00025 Under $1000 
-00225 $1000-1999 
-00625 $2000-2999 
-01225 $3000-3999 
-02025 $4000-4999 
-03025 $5000—5999 
-04489 $6000-7499 
-07569 $7500-9999 
-15625 $10,000—-14,999 
.30625 $15,000-19,999 
1.60000 $20,000 and over 


Life cycle 0 if have children (also 0 if not ascertained) — 

1 if married with no children; or if over 45 and 
single 

2 if under 45, single 


0 if does not live in one of the twelve largest 
metropolitan areas 
1 if does live there 


0 if does not live in a rural area (population 2,500 
or less) 
1 if does live in a rural area 


Education of head of 1 if no education or a grade school education 
family 2 if some high school (also if not ascertained) 

3 if completed high school 

4 if some college 

5 if a college degree 


Sex of this adult Xu 1 if a man 
2 if a woman 











dummy variables in problems of this type has been discussed recently in an 
article by Suits.'® 
2, THE ANALYSIS 
An equation of the usual form for linear regressions was used: 
Y = a+ bX + 02X2 + O3X3 + O4Xa + O5X5 + DeXe + b7X7 + OsXs 
+ boXo + dioXi10 + buXu + bi2Xi2 + disXis + buXua + u 

where the dependent variable measures whether or not an individual took a 
non-business air trip in the twelve month period prior to interview, and the in- 


dependent variables are defined in Table 931. 
Brief comments about these variables may be in order. It was anticipated 


1 Daniel B. Suits, “Use of Dummy Variables in Regression Equations,” Journal of the American Statistical 
Association, Vol. 52, No. 280, December 1957, pp. 548-551. 


(1) 
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that experienced air travelers would be more likely to fly and that X, would 
have a positive sign. Two possible and conflicting hypotheses arose in connec- 
tion with rail travel (X;): that people who travel frequently are likely to use 
both modes, with the consequence that use of one will be correlated positively 
with use of the other, and that the two may be substitutes, so that people who 
use one mode will tend not to use the other. It was expected that people who 
have no paid vacations or less than a week may travel by air rather than by 
more leisurely methods (X;). In defining this variable (X3) people who are self- 
employed were classified as having no paid vacation. For income change (X,4) 
the hypothesis was that people whose income has risen may spend more freely. 

The three attitudinal variables, Xs, Xs, and X; all represent answers to a 
single open question, “Why do you think some people travel by plane? What 
might keep some people from traveling by plane?” The hypotheses were that . 
those who mention “air mindedness” (including the thirill of flying) or think 
of air travel as cheap are more likely to fly, while those who mention fear of 
air sickness are less likely to fly. 

Both income and its square were used (X, and X¢9) to allow for the possibility 
of a curvilinear relation. A travel market study by Lansing and Lilienstein had 
suggested differences in people’s use of air depending on their stage in the life 
cycle (Xo), with young, single people most likely to fly and families with 
children under 18 least likely.? It was anticipated that people living in one of 
the twelve largest cities would be more likely to fly than those living elsewhere 
(Xi) and that the reverse would be true for those living in rural area (X,). 
Education (X13) was introduced as a proxy for social status, on the argument 
that high status people are more likely to fly (even if income is taken into 
account). Finally, the possibility of sex differences in non-business travel was 
considered (X44). 

It was known before starting the investigation that income has a powerful 
effect on the dependent variable." It was anticipated that the effect of one 
or more of the other independent variables might depend on the value of in- 
come. Accordingly, the sample was divided into three income groups and 
equation (1) was estimated for each group separately, thus allowing for inter- 
actions between any of the independent variables and income. The price paid 
for this strategem, of course, was a reduction of degrees of freedom in computing 
the estimating equations. 

The results of the calculations are shown in Table 934. Experience (X;) is 
highly significant in each group, with a regression coefficient at least seven 
times its standard error. Contrary to expectations, however, the regression 
coefficient for experience is much smaller for the income group $5000—9999 than 
for the other two groups. On a priori grounds, it seemed logical to expect that 
the coefficient would be the same for the three income groups, or would differ 
from group to group in some simple and regular manner. The coefficient for 
the middle income group might well have been half-way between that for the 
other groups, or 0.6, instead of the value observed, 0.3. Since the sampling 
error of this coefficient is only 0.04, the difference between the observed value 





1 See, for example, New York's Air Travelers, a report prepared by the staff of the Port of New York Authority, 
Table 15, p. 54. 
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of 0.3 and the “expected” value of 0.6 cannot be attributed to sampling error. 
This peculiar result led to a series of cross-tabulations described below. 

Rail travel (X:) appeared as possibly significant in only one of the three 
groups and there at not much above the two sigma level. Allowing for the 
understatement of sampling errors diminishes the importance of this result. 

Air mindedness (X;5) appeared as significant in Group III only. The sig- 
nificance of this result is also in doubt. A calculation for all income groups in- 
cluding air mindedness (X;5), sex (X14), and experience (X;) showed a coefficient 
for air mindedness with a value of only half of its standard error. The result 
for air mindedness in the top income group is in the marginal range, and 
the coefficient may be only the result of chance fluctuation. The sign of the 
coefficient is negative, the opposite of that predicted. The underlying problem 
here may be that those coded as “air minded” may include both people who 
think of flying as something unusual, exciting and perhaps risky and people 
who are converted to travel by air for more prosaic reasons. The other two 
attitudes tested, fear (X.) and cheapness (X7), showed no significant result. 

The effect of income was taken into account by the use of three income 
groups. Hence, it is not surprising to find no significant coefficients for income 
(Xs) and square of income (X¢). 

Life cycle (Xio) does show a positive effect (in the direction predicted) for 
the largest group, Group I. A calculation for the full sample using experience 
(X,), life cycle (Xo) and metro area (X1) also shows a small positive effect for 
this vuriable. Income and stage in the life cycle are known to be correlated. 
It was felt that this fact might be related to the absence of an effect for this vari- 
able in Groups II and III. Further experimentation with the variable seemed 
justified. 

Living in a metropolitan area (Xu) showed a positive effect, as predicted, 
for Group I. This effect is small, however, especially for the sample as a whole, 
and at the borderline of significance. This result later led to a reformulation of 
the variable. 

Neither living in a rural area (Xj) nor education (X;;) showed any effect. 
There does seem to be a tendency, however, for women to be more likely to take 
a non-business air trip than men. The effect is small enough to escape detection 
except for the largest group. 

The next stage in the analysis took the form of a series of frequency tabula- 
tions which had two purposes. First, it was necessary to straighten out the 
income-experience-life cycle tangle. A formulation was sought which fitted the 
data and which also could be fitted into existing knowledge of the variables. 
Second, it seemed essential to explore further the meaning of “experience.” 
Why should it be so powerful as a variable? 

With regard to the first problem, Table 936 shows the two-way relation be- 
tween experience and air travel. Of the inexperienced, only 2 per cent took a 
trip, compared to 15 per cent of the experienced. Table 936a shows the effect of 
experience within three income groups. It shows essentially what had been 
revealed already by equation (1). 

Table 937 shows the two-way relation between stage in the life cycle and 
air travel. The probability of taking an air trip is relatively high for young, 
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TABLE 936 


EFFECT OF EXPERIENCE AND INEXPERIENCE UPON AIR TRAVEL: 
PER CENT OF EXPERIENCED AND INEXPERIENCED ADULTS WHO 
TOOK A NON-BUSINESS AIR TRIP DURING THE TWELVE 
MONTHS PRIOR TO INTERVIEW 








Took Non-Busi- 
ness Air Trip 


Did Not Take 


Non-Business 
Air Trip 


Total 


Number of 
Interviews 





Experienced 
Inexperienced 


15% 
2% 








85% 
98% 





100% 
100% 





195 
800 





single people and young married people with no children. It declines for married 
people with children and rises for older couples whose children have left home 
and for older single people. 

Income, experience, and stage in the family life cycle are introduced together 
in Table 937a. The difficulty with this table, of course, is thut the number of 
observations in each cell becomes small. But the data do suggest that the effect 
of experience and income may depend on stage in the life cycle. Consider the 
columns for all incomes. At all stages of the life cycle only one to three per 
cent of inexperienced flyers took a trip. Of the experienced flyers who are mar- 
ried with children only eight per cent took a trip. Of the experienced flyers at 
other stages in the life cycle, however, twenty-six to twenty-eight per cent took 


a trip. The data suggest strongly that the effect of experience depends on stage 
in the life cycle. When the original regression equation was revised, therefore, 
the population was divided into groups by stages in the life cycle rather than 
by income groups. 

The second problem which was explored by the use of frequency tabulations 
was the interpretation of experience. One possible interpretation is that ex- 


TABLE 936a 


THE EFFECT OF INCOME AND EXPERIENCE UPON NON-BUSINESS 
AIR TRAVEL: PER CENT OF TOTAL EXPERIENCED AND INEX- 
PERIENCED ADULTS WHO TOOK A NON-BUSINESS AIR 
TRIP, BY INCOME CLASS 








Did Not 
Take Non- 
Business Air 

Trip 


Number 
of 
Interviews 


Took Non- 
Business Air 
Trip 


Family 


Rissene Experience 





Under 
$5000 


Experienced 11% 
Inexperienced 1% 


Experienced 12% 
Inexperienced 3% 


89% 75 
99% 532 





$5000-— 
9999 


88% 86 
97% 221 





$10,000 
or more 


38% 
10% 


62% 29 
90% 29 


Experienced 
Inexperienced 























NON-BUSINESS AIR TRAVEL 


TABLE 937 


EFFECT OF STAGE IN THE LIFE CYCLE UPON AIR TRAVEL: 
PER CENT OF THOSE IN DIFFERENT STAGES WHO 
TOOK A NON-BUSINESS AIR TRIP 











Took a Non- ee at Number 
: . | Take a Non- 
Stage Business Air Dediadis A of 


Trip Trip Interviews 





Young, single 9% 91% 100% 53 
Young, married, no children 5 95 100% 77 
Married, with children: 
Youngest under two 100% 
Youngest 2-4 100% 125 
Youngest 5-14 100% 182 
Youngest 15-17 100% 29 
Married, over 45, no children 100% 224 
Over 45, single 100% » 142 











Total! 5% 95% 100% 957 











1 This table excludes those for whom stage in the life cycle \was not ascertained or who do not fall in one of the 
stages shown (for example, widows with young children). 


perienced people are in a particular situation which makes it appropriate for 
them to fly. For example, they may take the same trip once a year by air to 


visit the same destination for the same purpose, and for this particular trip air 
travel may have certain continuing advantages. A second possible interpreta- 


TABLE 937a 


EFFECT OF LIFE CYCLE UPON PROBABILITY OF EXPERIENCED 
AND INEXPERIENCED FLIERS TAKING A NON-BUSINESS AIR 
TRIP, BY INCOME CLASS 








Per Cent Who Tcok a Trip 





10, «& 
Income Under $5000} Income $5000-9999 wee —_ 





1 2 
Experi- Inexperi-| Experi- Inexperi-| Experi- Inexperi- 
enced enced enced enced enced enced 





Married, with children 0 0 3 s 3 
n=37 n=227 *e n=117| n=15 n=tl 


Over 45, married, no chil- 2 4 a 2 
dren; over 45, single n=239 n=71 n=I1 


Under 45, single; under 45, 2 0 3 ' 
married, no children n=31 n=t n=4 














Number of respondents 532 219 27 29 





1 Experienced refers to those adults who took a non-business air trip earlier than 12 months prior to interview. 
2 Inexperinced refers to those adults who had not taken a non-business air trip.at least 12 months prior to inter- 
view. 


*Percentage not reported for cells containing under 20. observations, 
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tion is that experienced air travelers become converts to air travel. Their 
attitudes toward it may change as a result of their experiences. 

The latter possibility is explored in Tables 939 and 939a which compare dis- 
advantages and advantages of air travel respectively. As noted earlier, these 
comments were made in response to the questions: “Why do you think some 
people travel by plane? What might keep some people from traveling by plane?” 
Answers to this question by those whe have used air travel and those who have 
not are shown separately. As far as the disadvantages are concerned, of those 
who have used air, fourteen per cent mention fear of air sickness compared to 
six per cent of those who never have used air, a difference which is significant 
at the five per cent level. Otherwise the two groups mention about the same 
reasons why “people” might not travel by air. Only five per cent of those who 
have used air cannot think of any disadvantage, compared to fourteen per cent 
of the rest of the population. A parallel finding emerges in Table 939a. Only 
three per cent of those who have used air mention no reason why “people” 
might travel by air, compared to fourteen per cent of the rest of the population. 
A possible interpretation of these results is that people who have traveled by 
air know more about it and have more to say on the subject. 

People who have used air are more likely than those who have not to refer 
to air travel as cheap or comfortable. They are also more likely to refer to its 
speed, though eighty-four per cent of those who have never used air also 
mention its speed. Taken together, these data do tend to confirm the existence 
of an initiation effect on people’s attitudes toward air travel arising out of the 
experience of flying. Of course, it does not follow that the effect of “experience” 
as a variable can be explained wholly in this manner. 

The third stage of the analysis is an attempt to develop a revised equation 
to take into account what had been found in the first and second stages. The 
equation is of the same form as equation (1), and the variables are defined in 
Table 940. Experience (X,) was defined as before. Occupation (X_) was intro- 
duced as a measure of socio-economic status for the same reasons education 
had been introduced in equation (1). Rural-metro (X;) represented a reformu- 
lation of two variables from the earlier calculation. Income change (X,) was 
repeated. Life cycle (Xs) was also repeated, though with an expanded classi- 
fication. 

Distance to airport (Xs) and frequency of service (X ) represented attempts 
to improve on the old Xy which simply classified people as living or not living 
in a large metropolitan area. The hypothesis is that the aspects of living in a 
large city which are important are the distance to the airport and the frequency 
of air service. These variables are measured only roughly. Distance to airport 
is actually the average distance for all interviews in 9 single primary sampling 
unit. Frequency of service is measured by the total number of air carrier 
aircraft departures in fiscal 1954. 

Cheapness (X;) and comfort (Xj2) were introduced because of their impor- 
tance as shown in Table 939a. Fear (Xi) was introduced on the hypothesis 
that there may be something unusual about the few people who do not mention 
fear in answer to the question asked. 

The only other change was the addition of a variable for business air travel 
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TABLE 939 


ATTITUDES TOWARD DISADVANTAGES OF AIR TRAVEL RELATED 
TO USE OF AIR TRAVEL 











T 
Per Cent of Those Mentioning yeah — 


Disadvantages of Specific Disadvantage Disadvantage Who 


Air Travel Traveled by Air 
Never Used Air Used Air! Last Year 


Number 








Expensive 30 31 5 
Fear of sickness 6 14 10 
Fear (general) 75 82 7 
Health 2 6 12 
Miss scenery; Speed 1 2 
Undependable in 
bad weather 2 
Bad connections 
Other disadvantages 
No disadvantages 14 


Number of 
interviews 784 214 














1 Includes those who used air for the first time in the year prior to interview. 
2 Less than 1 per cent. 
3 Percentages not reported for cells containing under 20 observations. 


TABLE 939a 


ATTITUDES TOWARD ADVANTAGES OF AIR TRAVEL RELATED TO 
USE OF AIR TRAVEL 








Per Cent of Those 
Mentioning This 
Advantage Who Number 
Traveled by Air 

Never Used Air Used Air Last Year 


Per Cent of Those Mentioning 
Advantages of Specific Advantage 
Air Travel 








Cheap 

Safe 

Speed 

Comfort 
Facilities 

Clean 

Good connections 


Ewe 


Convenience 

Easy with children 
Air minded 

Other advantages 
No advantages 


es _ 
Wor Oe ee we Oo - 


enw O ew e ee e 


— 


Number of 
interviews 784 214 














1 Percentages not reported for cells containing under 20 observations. 
? Less than 1 per cent. 
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TABLE 940 


DEFINITION OF VARIABLES FOR REGRISSION ANALYSIS OF 
WHETHER AN INDIVIDUAL TOOK A NON-BUSINESS AIR 
TRIP IN THE LAST TWELVE MONTHS 


Revised Regression Equation 








Variable 


Symbol 


Range of Values 





Non-business air travel 
Experience 


Occupation of this adult 


Rural-metro 


Income change 


Life cycle 


Distance to airport 


Cheapness 


Income 


Frequency of service 





y 


Xi 





Same as for (1) 
Same as for (1) 


0 If unemployed, farmer, retired, laborer 

1 if clerical and kindred worker, housewife, stu- 
dent, craftsman, foreman, operative 

2 if sales worker 

3 if self-employed, professional or managerial or 
technical worker 


O if lives in rural town, congested areas under 
2,500, or in the country 

1 if lives in place of 2,500 to 50,000 or in rural 
suburbs of large metropolitan areas 

2 if lives in one of the twelve large metropolitan 
areas 


Same as income change for (1) 


1 if married, youngest child under 2 

2 if married, youngest child 2-4 

3 if married, youngest child 5-14 

4 if married, youngest child 15-17 

5 if over 45 and married with no children at home 
6 if over 45 and single 

7 if young, married, with no children 

8 if young, single 


Distance to airport from center of primary sam- 
pling unit (miles) 





1 if under 8 (includes all of the seven largest cities 
except Detroit) 

2 if 9-19 

3 if 20-30 

4 if 31-60 

5 if 61-124 


0 if no mention of cheapness as a general ad- 
vantage of flying 
1 if mentions cheapness 


Same as income for (1) 


Thousands of air carrier aircraft departures in 
fiscal 1954 (nearest airport) 








NON-BUSINESS AIR TRAVEL 
TABLE 940 (continued) 





Variable Symbol Range of Values 





1-3 

47 

. 8-14 (includes not ascertained) 
15-24 


GON & Om 99:09 


Business air travel 


- © 


if took no business air trip in the last 12 months 
if took such a trip 


Fear 


o 


if mentions fear as a general disadvantage of 
flying 
if does not mention fear 


_ 


Comfort if no mention of comfort 
if mentions comfort as a general advantage of 
flying 


Education Xu Same as education for (1) 








Sex Xu Same as sex for (1) 





(X10). Here the hypothesis was that people who travel by air on business may 
be unlikely to take non-business air trips. They may combine a pleasure trip 
with a business trip. 

The calculations were carried out for three life cycle groups separately. 
Results appear in Table 942. The simple correlation coefficients between the 
independent variables are shown in Table 943. Again experience (X,) emerges 
as highly significant, but with the smallest coefficient for married people with 
children. For this group, however, income (Xs) has a high coefficient. Thus, 
married people with children are not likely to fly for non-business reasons unless 
their income is high. For this group even with “family plan” rates the cost of 
air travel is high relative to travel by automobile. It is reasonable to conclude 
that this group is the one where income makes the most difference in the pro- 
pensity to fly. 

Of the two measures of social status other than income, occupation (X¢2) 
and education (X;3), neither seems to have any effect. 

The effect of place of residence is comprised in the effect of distance from 
airport (X.) rather than rural-metro (X;) or frequency of service (X»). The 
effect of distance appears only for Group II. The coefficient is about two and 
a half times its standard error and the sign is as predicted. The statistical argu- 
ment for the importance of the variables is not impressive. The measure of 
distance, however, is crude, and the most reasonable interpretation seems to 
be that the low coefficient is the result of failure to measure the distance or 
travel time from each individual dwelling to the nearest airport. 
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TABLE 943 


SIMPLE CORRELATION COEFFICIENTS BETWEEN INDEPENDENT 
VARIABLES USED IN REVISED REGRESSION EQUATION 


The three entries in each cell are the coefficients as separately computed for the three 
stages in the family life cycle used in Equation 2: respectively, Group I, married with 
children; Group II, older, married, with no children; and Group III, young, single, and 
young, married with no children: 








X; Experience 1.00 





17 
X: Occupation 14 
M4 











Xs Life cycle 




















Xs Income 








Xp» Frequency of service 





Xie Business air travel 


228|282| sks 





~ 
-_ 








Xz Comfort 


&88/|88 





Xis Education 


Ras 








&8B8 





a4 
< 
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It is mildly suprising to find that income (Xs) is not a factor in whether 
young, single people or young married people with no children take an air trip. 
Part of the problem may be that few people in this group have large incomes. 
For the older, married people with no children, income has a coefficient in the 
predicted direction but equal to less than three standard errors. For the 
married people with children, the income coefficient is clearly reliable. 

Of the three attitudinal variables, cheapness (Xz), fear (Xu), comfort (X1), 
only cheapness has a coefficient large enough to be clearly significant, and that 
only for Group II. However, in view of the correlations between cheapness 
(X7), comfort (Xi), and experience (X,) the former variables may have an 
effect which cannot be separated from the effect of experience. 


TABLE 944 


NUMBER OF NON-BUSINESS AIR TRIPS BY FAMILY INCOME 
(Percentage Distribution of Adults Who Took One or More Non-Business Air Trips) 





Income 





Number of Non- 
Business Air Trips $10 ,000- 
14,999 





One 

Two 

Three 

Four or more 





Total 
Number of interviews 




















1 This category includes income not ascertained. 


The results for business air travel (X10) are clearly significant, while those 
for sex (X,4) are in the range where statistical significance is uncertain. One 
possible interpretation of the results for these variables is that men who take 
air trips on business may try to arrange to take their wives along. Such a 
trip is here classified as a non-business trip for the wife. The data suggest 
that this tendency does not exist in families with children, or, at least, that it 
is more common where there are no children at home. It may also be true that 
young, single women are more likely to take non-business air trips than other 
groups in the population. 

The final phase of the research concerned the number of trips taken. Of all 
adults who took a non-business air trip in the year prior to interview, seventy- 
two per cent took only one such trip. The original hypothesis was that the 
probability that an adult will take more than one non-business air trip will 
depend on his income. Table 944 shows the relation between family income and 
number of non-business air trips. The results are in the predicted direction, but 
the differences between income groups are very small. Of those with incomes 
over,$15,000, about forty per cent took more than one trip, compared to about 
twenty-five per cent of those with incomes under $5000. 
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TABLE 945 


NUMBER OF NON-BUSINESS AIR TRIPS BY STAGE IN THE LIFE 
CYCLE (PERCENTAGE DISTRIBUTION OF ADULTS WHO TOOK 
ONE OR MORE NON-BUSINESS AIR TRIPS) 





Stage in the Life Cycle 





Number of Non- 


Business Air Trips poege 


Over 45, Married,| Young, Single; 
No Children; Over} Young, Married, 


45, Single No Children 





One 

Two 

Three 

Four or more 


Total 
Number of interviews 


72 
14 
5 

9 
100 
407 


100 
146 


69 
21 
4 

6 
100 
142 


67 
13 
6 
14 
100 
119 

















1 Includes those for whom stage in the life cycle was not ascertained or who do not fall in one of the usual stages 


Since stage in the life cycle had proved of importance in the earlier analysis, 
Table 945 was prepared showing the number of non-business air trips for air 
travelers ai different stages in the life cycle. The results are in the direction 
which one would predict from the earlier findings: seventy-eight per cent of 


married adults with children took only one trip compared to about sixty-eight 
per cent of other adults. The difference between the two proportions, however, 
is not statistically significant. (This statement is based on calculations which 
do take into account the complexity of the sample design.) 

Table 945a compares the number of trips for travelers with and without 
experience of air travel prior to “last year.” Again, the difference between the 


TABLE 945a 


NUMBER OF NON-BUSINESS AIR TRIPS BY EXPERIENCE WITH AIR 
TRAVEL (PERCENTAGE DISTRIBUTION OF ADULTS WHO TOOK 
ONE OR MORE NON-BUSINESS AIR TRIPS) 








Number of Non-Business i ; 
_ — Trips ” Total! Experienced? Inexperienced? 





One 72 66 77 
Two 14 16 17 
Three 5 7 1 
Four or more 9 11 5 


Total 100 100 100 
Number of interviews 407! 140 60 














1 The total column includes interviews from both spring and fall waves. The other columns do not. 
2 Experienced air travelers are those who had taken an air trip prior to the 12 months immediately preceding 
the survey. 
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TABLE 946 


NUMBER OF NON-BUSINESS AIR TRIPS BY FREQUENCY OF BUSINESS 
AIR TRAVEL (PERCENTAGE DISTRIBUTION OF ADULTS WHO TOOK 
A TRIP BY AIR FOR ANY REASON) 





Number of Business Air Trips 





Number of Non- 
Business Air Trips 


io) 
BS 
® 


Two 





None 80 
One 
Two 
Three 


Four or more 


Total 
Number of interviews 




















23] wo kB 
S3| area 





1 Less than 1 per cent. 


groups is small. Some sixty-six per cent of the experienced report only one trip, 
compared to seventy-seven per cent of the inexperienced. 

The final table in this series, Table 946, includes all air travelers, and relates 
the number of business air trips to the number of non-business air trips. Since 
the table includes only adults who took an air trip for some reason, of those 
who took no business air trip all took one or more non-business trips. Of those 
who did take business air trips, about 8 out of 10 took no non-business air 
trip. The number of non-business air trips does not depend at all closely on 
the number of business air trips. 


8, SUMMARY AND CONCLUSIONS 


The probability that an individual will take a non-business air trip depends 
upon four different classes of factors: 

1, Economic situation. The probability that an adult will take one or more 
non-business air trips increases with his income. The effect of his income is 
greatest if he is married and has children. It is small or perhaps non-existent 
if he is young and single or young and married but with no children. 

Whether an individual has a paid vacation of a week or more makes little or 
no difference in the probability that he will take an air trip. 

Women are more likely than men to take a non-business air trip. Part of the 
explanation is a tendency for men traveling on business to take their wives 
along. This tendency does not seem to exist if the family includes children. 

2. Sociological situation. The stage in the family life cycle of a person’s 
family is important in understanding the tendency to travel by air. Families 
with children tend not to travel by air. For this group the size of the family 
income is important in determining the probability that they will fly. 

The social status of an adult seems to have little or no effect on the probabil- 
ity that he will take non-business trips by air except to the extent that income 
itself is a factor. 
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3. Availability of air travel. The most reasonable interpretation of the data 
is that distance to the air terminal is a factor in determining whether an adult 
will travel by air. It is this distance which seems to be relevant rather than the 
type of community in which he lives. 

4. Altitudes and experience of the individual. Whether an individual has taken 
an air trip is an important factor in determining the probability that he will 
take one or more non-business air trips in a year. Experience with air travel 
brings with it a shift in attitudes, notably an increased feeling that air travel is 
cheap and comfortable. 





APPROACHES TO NATIONAL OUTPUT MEASUREMENT 


Paut B. Simpson 
University of Oregon 


Products can be rendered homogeneous in terms of substitutability 
in consumption. Final products are substitutes in consumer choice and 
valuations are defined by marginal sacrifice incurred. Welfare inter- 
pretation of gross product requires that existence of an average or 
typical consuming unit be assumed. The Laspeyre and Paasche quan- 
tity indexes measure levels of cunsumption of such a unit. 

Products rendered homogeneous in terms of output substitutability 
are valued with marginal rates of physical substitutability. The latter 
are given by unit after-tax factor costs, including capital allowances, 
under certain conditions of homogeneity in production and monopoly, 
but only approximately. The proportion of output represented by gov- 
ernment or other sector will be different for this measure than for the 
consumer measure. Also price indexes will differ. Increase in government 
product for example will raise consumer prices without affecting factor- 
cost prices. 


RopucTs can be rendered homogeneous for measurement purposes by 

establishing substitutability either in consumer choices, or in production 
processes. These methods are taken from economic theory which is based 
largely on the value implications of such substitution possibilities. The relation 
of these two principles to national output measures stil! requires clarification, 
however. Hicks [6] in his classic piece pointed out the desirability of two 
measures. The Office of Business Economics of the Department of Commerce 
(OBE) seems to approve of the dual approach since it defines two measures, 
that of gross national product and that of national income, though it has not 
clearly associated these two aggregates with the two principles of substitution, 
and has not developed national income very far. The British practice [1] and 
Stone [18] also seem to approve of dual approaches with leaning toward the 
product approach. Copeland [3] emphasizes the consumer approach and 
questions the need for any other. On the other hand, Colm [2] sees a need for 
a distinct producer approach, since he seeks a measure which is invariant in 
shifts of resources. Also the welfare discussion as carried on by Samuelson 
[16], Little [10, 11, 12] and others suggest that aggregative measures on a 
consumer basis are meaningless, and that the production approach is better. 
Ohlsson [14] feels that at least two output measures are desirable, but leans 
toward the production approach [14, p. 104] because of the difficulties raised 
by interpersonal comparisons in the consumer approach. 

In this writer’s opinion two developments of national output are desirable, 
and these should be based clearly on the two principles mentioned. The prob- 
lems are both interpretive and statistical, though this would not be the case if 
taxation, monopoly, and dynamic complications did not interfere with workings 
of a competitive economy. This follows from the principle theorem of competi- 
tive equilibrium theory, namely that marginal substitution rates for consumer 
choices will equal those for physical alternatives. The data of the United States 
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suggest, however, that tax considerations are sufficiently important to make 
two statistical measures desirable. 

We may make some crude calculations of the extent to which taxes intentions 
with the equality of the consumer and production approaches by considering 
prices net and gross of taxes. From the consumer point of view, gross prices 
determine available alternatives. From the production point of view, prices 
net of taxes measure the substitutability of goods, since such net factor returns 
measure the costs of goods in a physical sense. Thus we seek an indication of the 
difference in the two approaches by considering the importance of taxes in 
different sectors of the economy. 

Consider 1953, the post war year of largest share of government activity. 
Assume that taxes on privately produced goods are the same whether govern- 
ment purchased or not, and that taxes on labor incomes are at the same rates 
for government and non-government employees. Government is assumed to 
pay no taxes on its business done directly by hired labor, except for personal 
income taxes paid by government employees. On this basis we compute taxes 
paid on government product as: 


gov. compensation of employees 





X personal income taxes 


all compensation of employees 


gov. purchases from business _ taxes except personal income 
all business product taxes of gov. product. 





Using values in the table, we find the 1953 taxes on government product to be: 
32 52 32 X 3 
— X 38 — (95 - ~~~) - 20 
208 ¥ 331 


GROSS NATIONAL PRODUCT AND RELATED VALUES 
(billions of dollars) 








1953 





. Gross national product 

. Government product 

. Private product (1-2) 

Government receipts 

Personal taxes 

. Taxes except personal (4-5) 

. Compensation for government employees 

. Compensation of employees except government 
. Compensation of all employees (8 +7) 

. Government purchases from business (2-7) 
. Business product (1-7) 


KH SODNMSAPwWN 


_— 











Source: Survey of Current Business. 


On this basis we find government product to be $64 billion (84-20). This is 
24 per cent of total after-tax product of 268 billion (363-95). This may be 
compared with 23 per cent government product on the gross national product 
basis. The difference arises because private product includes taxes to a greater 
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extent than government product. While the difference is not large, it suggests 
that the relation of values to their productive importance is a matter worthy of 
investigation. 

Growth of government product is important to price indexes also. In 1947, 
only about 12.5 per cent of gross national product was devoted to government 
product. This rose to 23 per cent in 1953. Such a transfer of product from 
private to government account accompanied by rising tax rates is highly in- 
flationary in a way that has nothing to do with changing price levels in a 
monetary sense. Consider the inflationary effect of increasing government prod- 
uct, say from 10 per cent of all product to 20 per cent, basic prices remaining 
unchanged. Let ¢ be the tax rate, p, prices in factor costs, Y total physical 
product and G@ total government physical product. The choice of physical 
units is arbitrary. We have 


p(l + d(Y — @), value of business product 
p(i + 4G, value of government product 
taxes = pi(Y — G) + ptG = p(l + OG (balanced budget) 


Suppose that ¢ increases by At and G by AG, Y and p constant. It follows that 


1+t+At Y-@G 
1+ Y-—-@G@-aGg 





If G and AG are .1Y, the ratio is 9/8, indicating that doubling government 
product is inflationary to the extent of 124 per cent. 

Price indexes based on the production approach would yield new information 
on the nature of inflation. This conclusion supports that of the report of the 
joint committee on the National Economic Accounts [8, Chap. VI], calling for 
additional price studies. 


1. THE CONSUMER APPROACH 


Among the problems arising in this approach are treatment of savings and 
consumer credit, definition and valuation of final product, interpersonal 
comparisons and changes in tastes. We wish to see how far we can go in resolv- 
ing these problems by use of the opportunity cost principle of evaluation and 
by use of the device of a typical household, in which average quantities of goods 
are consumed. The opportunity cost principle made famous by Davenport and 
other economists around the beginning of this century states that the satis- 
factions obtained from consumption are measured by the alternatives sacrificed. 
A more modern way to state the opportunity cost principle is in terms of sub- 
stitutions. Economic goods which are substitutes for a considerable number of 
other consumer products are true consumer products. Those which are com- 
plementary to consumer goods, are productive or intermediate goods. 

The use of a typical household is a familiar one in the case of consumer price 
indexes which are computed by using the average consumption of goods by 
families of certain income characteristics in selected cities as weights. A similar 
device can be used to interpret consumption data. 
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Savings and consumer credit. If we follow the consumer income concepts of 
Irving Fisher, counting only “for each article of capital, the values of all its 
services and dis-services” [5, p. 140], we should not count savings as an integral 
part of income. This view has been given token approval by Kuznets [9] and 
others, but it is not generally followed. From our principle it follows that a 
consumer who sacrifices part of his income in a time period for savings has 
made a purchase and this is marginally equivalent to consumption. Hence 
savings are properly a part of output. To be sure the savings may eventually 
provide for future consumption, and if it does so, we shall have counted savings 
in a sense twice. This does not prove the concept wrong however. We shall be 
comparing totals for different periods which are mutually interdependent, but 
this involves no error. To cite an analogy, we may compare the number of cattle 
slaughtered one year with that of another year, even though the number 
slaughtered the first year influences the number in the second. 

Some saving is done not by householders but by businesses and government 
enterprises. Such saving must be considered as done on behalf of consumers, 
representing sacrifice of consumer goods in favor of asset accumulation. The 
problem that arises in this connection is the more basic one of valuation of 
real assets. One can make an argument that revaluation of existing assets, 
that is capital gains and losses from inventory valuations, accidental damage 
and other value changes to wealth, must affect savings in the sense of consumer 
sacrifice of potential consumption, just as other saved income is not spent. At 
this point however, one is attracted by savings concepts in production terms. 
Newly produced assets could conceivably be replaced by consumer goods 
through aiternative outputs. This is not true of revalued items. Thus we main- 
tain concepts that are more acceptable from the productive point of view if 
we use savings excluding capital gains and losses. This is an arbitrary decision 
made necessary by the fact that the economic process is imperfect, since capital 
gains and losses would not arise if perfect foresight existed and equilibrium 
adjustments were perfect. 

Consumer credit involves a sacrifice of future enjoyments for present. The 
parallel with savings is complete. Note also that savings is a substitute, con- 
sumer credit a complement of consumer goods generally. Repayment of con- 
sumer credit is of course positive savings, while contractions of debt is negative 
savings to the party involved. 

We may assume that all interest on consumer credit is paid in a year or other 
period subsequent to that of borrowing. It is true that finance charges are 
frequently added to loans, and interest on any loan may be collected on a 
discount basis. This merely raises the interest charge. It is not proper to speak 
of interest payment during a period when the net flow of funds is from lender 
to borrower. We may dismiss interest in periods of borrowing, and consider 
rather such payments as deductions from the amount loaned. Prices paid by 
consumers who borrow and by others who pay cash are the same. Moreover 
interest paid on past consumer credit is a deduction from income, It involves 
no current sacrifice of product and is not a consumer product. It is true that 
consumer credit has made possible an increase in total satisfaction of a con- 
sumer over time. We have chosen however to represent current sacrifice as 
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the measure of consumer product, and on this basis interest on consumer 
credit effects income, not consumption. 

If these ideas seem unnatural, it may help to consider an alternative pro- 
cedure, that of the Fisher man. In this case we make each dollar of expenditure 
yield equal satisfactions for any consumer no matter when the expenditure 
occurs, after allowances for interest. Complete inter-period comparisons of 
expenditures become possible. Adjustment for price changes between periods 
are meaningless. Production of investment goods is neglected. These seem less 
attractive notions than those given above. 

One result of these definitions is that savings may be negative, since the 
sacrifice of current enjoyments for future may be a drain on the future. Even 
if we measure income on a gross basis, with no capital allowances, the possi- 
‘bility of negative saving exists, since liquidation of inventories of a non- 
investment nature could more than offset gross investment in capital goods. 
It is sometimes difficult for students of national income to see how a measure 
of any part of output can be negative. From a consumer point of view the 
solution is simple however, since future enjoyments are sacrificed. Thus if A 
is value of physical output for consumption, B is value of physical output for 
savings, and C is value of inventories liquidated, A+C is consumer product 
and B—C savings. For net measures, B does not include production to the 
extent of capital allowances; for gross measures, it does. 7 

Definition and valuation of final products. The basic notion is that for each 
consumer the marginal equivalence of two goods a and b is determined by the 
number of units of a the consumer is willing to substitute for one unit of b. 
For market goods, gross price, including all taxes measures the availability of 
alternatives. For home produced services, the number of units of a market good 
which would induce the consumer to give up a unit of service, if the choice 
were available becomes a valuation of goods in kind. Thus if 50 cents an hour 
is the price for domestic service which would induce a housewife to curtail 
housekeeping activities, housekeeping has a value of 50 cents an hour. Only for 
the person who is on the margin of buying or actually buying household services 
is the cost of housekeeping a true opportunity cost. For others it is the price 
that would induce them to purchase which measures the substitutability of 
the service with other goods. People will not pay others to perform recreational 
and consumer household activities such as making love and tending children, 
and such activities are not a part of consumer product. 

Thus theoretically there is little difficulty in assigning values to goods in 
kind, however difficult the statistical approximation may be. All goods involv- 
ing sacrifice of other goods including leisure, if it involves sacrifice of income and 
not recuperation and refreshment necessary to work, should be included if the 
price which would be paid for alternatives is above zero at the margin. In the 
case of bank services, the loss of interest in holding demand deposits is perhaps 
the best measure of consumer sacrifice, though the proper measure of sacrifice 
will vary by households. 

The same principle operates in the case of government product. In so far as 
government product involves a sacrifice of consumer services, it is a final 
product. If it is a complement rather than a substitute for consumer products, 
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as some goods such as law enforcement and justice may ‘be, government 
products have no opportunity costs and should be excluded. The cost of 
government final products as seen by the household is of course the principle 
of pricing, since this measures the sacrifice. It may be objected that in time of 
recession government products may involve no sacrifice, since resources are 
freely available. This is not a proper argument however, since resources could 
be put to use for goods the consumer wants—as by tax cuts or consumer 
bonuses. The product involves a consumer sacrifice and is not free even when 
unemployment exists. Another objection may be that government product is 
purchased by group, not by individual action. Then the sacrifice may not 
measure that of any one household. The use of a typical household seems to 
make this problem less severe however. 

Interest charges on government debt, except where real investment is in- © 
volved, partake of the nature of consumer credit. No sacrifice of one product 
for another is involved in deficit financing. The losses and gains are entirely 
in the current period. Rather a redistribution of time of sacrifice among families 
is involved. Thus decisions of OBE to exclude government interest costs from 
government product are correct from the opportunity cost point of view. 

Another problem of defining final product is separation of production and 
consumption goods. Some items of household expenditures, such as (minimum) 
commuting expense are properly productive acts, not consumptive. The proper 
means of distinction is that of consumer substitution. Those expenditures 
which are rivals to known consumer goods are for enjoyment. Those which 
are complementary to the income earnings process are productive expenditures. 
Some consumer goods may be complementary, but since substitutes must pre- 
dominate the rule is applicable if the comparisons are made with a sufficiently 
broad set of goods [7, Chapt. 3]. 

A problem of duplication arises for gifts. The OBE does not count gifts, 
including charity, to domestic persons as product, since the consumption of the 
recipient who enjoys the physical consumption is represented. Mercy is not 
“twice blessed.” On the other hand, if a person remits to a foreign family, the 
giver is in effect assigned the enjoyment, else a physical product of the country 
is lost. Thus mercy abroad is felt to be “twice blessed.” This procedure which is 
obviously reasonable from a production point of view is difficult to handle from 
a consumption point of view. One approach is to make the giver alone enjoy 
the satisfaction since his is the sacrifice. This makes it possible to include 
product sent abroad and domestic product under the same category without 
multiplying satisfactions. 

Interpersonal comparisons and constant tastes. It is not necessary to assume 
constant preferences over time. If consumption of all goods increases 10 per 
cent, consumption has increased this amount, and in a sense the satisfaction 
level has increased 10 per cent. This assumed measuring device makes the con- 
sumer satisfaction yardstick comparable to production level ideas. It is neces- 
sary however to assume preference functions for a given period of time in order 
that different consumption bundles be rendered comparable. On the basis of 
such a function, we can approximate consumption bundles which can be com- 
pared with those of other periods on a proportionality basis. 
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With these assumptions, we find that the problems of measurement are 
familiar ones of index numbers. We take a set of goods II comprising the average 
amouut of each good consumed in a period time II. The slope of the preference 
curve or point of equilibrium is given by ratios of prices, which means that the 
budget: line of period II approximates an indifference curve. We find on the 
budget: line a bundle of goods II’ equivalent in satisfactions to that of II such 
that the goods in the bundle are in the proportions of I. The common ratio, say 
a’, becomes the measure of proportional change in level of satisfactions. The 
index formula for a’ is the Paasche quantity index, as indicated below. Thus 
the level of satisfactions for period II relative to I on yardstick I, assuming 
linear indifference functions in period II is given by the Paasche quantity 
formula, and the price change is given by the Laspeyre formula. 


a, 











Ge oe 


aby bd 


Fig. 954. Defining index a! with linear indifference curve. 


We may write algebraically the results for comparison basis I as follows: 


yu = pigl = p apt! on pligit’ 


Hence 


oe » piigi bo > pilgi 1 hy pig 
Lrg Dre Dp 





If we used the bundle of goods II as a yardstick, we should obtain a quantity 
or satisfaction index, #7, which is the Laspeyre formula, and a Paasche price 
formula. The two sets of indexes appear to be equally good. It may be noted 
that for goods for which price data are not available, a value adjustment for 
non-covered goods may be employed in the quantity formula similar to that 
used by Fabricant for example in manufacturing output [4]. The assumption 
required for its application is that prices of non-covered items move similarly 
to those of covered items. In the case of savings, this adjustment may be theo- 
retically more correct than use of price adjustments for investment goods, 
since the latter have meaning only in terms of future productivity, and it is 
current sacrifice which is being measured. 
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2. ASSUMPTIONS AND PROPERTIES OF THE CONSUMER APPROACH 


1. Households. There exists in each time period a typical household which 
consumes the average amount of each product consumed by all households. 
For each time period there is a preference function expressing the desires of 
that household, and a linear approximation of an indifference function for each 
period is given by the budget line through the consumption point of the 
typical househcld. 

Classes of product. There exists a set H of items purchased by households and 
known to be consumer goods. Other goods purchased which are substitutes for 
most items of H are also consumer goods, and items purchased as gifts or with 
bad debts are also consumer products. Business performed services involving 
sacrifice to households, home produced goods involving actual or potential 
sacrifice of H, government services not complementary to consumer goods, 
savings either gross or net of capital allowances are also final products. Ex- 
cluded are government services, household purchased items, and business 
products, complementary to consumer products, and interest on consumer 
credit. 

Valuation basis. For market goods, gross purchase price. For home produced 
products, price which would induce purchase from others. For business pro- 
fessional services, income sacrificed by consumer. For government services, 
costs as seen by household excluding interest. For savings, expenditures on 
new real assets, less expenditures which would have been necessary to replace 
liquidated inventories. 

Interpretation. The relative change in value of product in two periods or 
places equals the ratio of number of households times price index times satis- 
faction level index. The satisfaction index may be obtained by using the linear 
indifference function of one period to determine a bundle of goods equivalent 
in satisfaction to actual consumption of a typical household in the other period, 
which bundle has the property that each good is the same multiple of consump- 
tion of the other period. The satisfaction index will be a Paasche or Laspeyre 
formula depending on whether the set of goods of which period is used as a 
comparison basis. 


3. THE PRODUCTION APPROACH 


If the cost of factors of products are C, per unit of z and C, per unit cf y, 
we can argue that C,/C, determines the marginal equivalence of z and y, since 
the cost of factors used in producing a unit of y will hire factors necessary to 
produce this number of units of z. Among the problems arising in this approach 
are definition of factors, valuation, classification of product, and interpretation. 

Factors of production. Economists sometimes classify all existing agents in- 
cluding materials and real capital items such as tools, buildings, etc., as factors. 
At other times they speak of factors in a more general sense, as capital, labor, 
resources, and entrepreneurship. For the problems here considered, the more 
general formulation is required, since we wish to consider substitution possi- 
bilities, and these require substitutions in forms of capital. With sufficient fore- 
sight, capital forms which deteriorate can be replaced with new forms, which 
means that property itself, not the specific forms of property, is a factor. Thus 
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the various skills of labor, property, and entrepreneurship are the relevant 
factors. Property returns take various forms such as interest, royalties, rents, 
dividends, retained earnings and proprietor income, as has been emphasized 
by Rolph [15]. The form of return is not important; the only important com- 
plication is that entrepreneurship enters to some extent into most returns, in- 
cluding wages though possibly not into interest on low risk securities. Such 
entrepreneurship if it represents good judgment is itself a factor. Windfall 
profits and other profits not due to skill of management are returns which are 
not factor returns. Monopoly profits are returns from limiting use of resources 
and are not payments to factors. These non-factor payments to owners of 
factors are limitations on the rationality of the economic system, and conse- 
quently upon the accuracy of economic measurement. 

Factors costs for individual commodities should include capital allowances 
along with corporate profits and entrepreneurial return. The best substituta- 
bility estimate of products is obtained by counting cost of all factors necesssary 
for output. If capital replacement takes place in a smooth fashion, actual factor 
costs can be used. However, it is probably simpler to treat capital allowances 
as one type of return. Gross product will then measure output from a zero 
point of no output in the given period of time. Net product will measure output 
from a zero line of no reduction in wealth. In either case, the value of individual 
products will include capital allowances, these being necessary to production 
in long run computations. 

Valuation. Valuation of non-market goods will be in terms of the market 
value of factors used. These valuations will be based on procedures used for 
market goods. Thus it suffices to consider the market case. 

We wish to consider the marginal product of factors in different employments. 
If we proceed along economic lines and not along technological lines, we use 
the minimum cost principle: Value of marginal product equals summation of 
cost of factors times marginal factors. Here we are speaking of ultimate factors. 
Material cost must be reduced to their factor costs. Though we can eliminate 
the effect of inter-industrial transactions in this way, we cannot eliminate taxes 
so readily, because these affect costs of factors. We are, thus, in something of a 
dilemma. From the supply standpoint, after-tax returns measure substituta- 
bility. Equal after-tax payments should yield the same amount of product. 
From the demand standpoint, before-tax payments determine substitutability. 
Equal before-tax payments should yield equal products. If taxes affected all 
products and factors proportionally, no problem would arise. However, 
property taxes, transportation excises, government exemptions, and corporate 
profit taxes are not likely to affect products equally. Certainly home produced 
products will be exempt. Thus any choice of factor evaluation will have to be 
compromise, since financial data cannot give the true marginal equivalence of 
factors when employment of factors is distorted by non-proportional taxation 
methods. 

One approximation is obtained by assuming that tax rates and rates of pay 
for each factor are the same in all industries. If this were the case the cost of 
factors in the minimum cost principle would be the same. We would only need 
to adjust selling prices for sales taxes and monopoly returns to obtain equality 
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value of marginal products for any given product. Each product would be 
valued at sales prices excluding sales tax less monopoly profit or plus subsidy 
(c.f. equation 7, of mathematical note). This method is essentially that of the 
OBE national income estimates, except that OBE neglects depreciation. The 
difficulty with this simple approach is that the assumptions are scarcely valid, 
as suggested above. 

A second approach is to use net disposable income of factors. We can justify 
this approach along these lines. Assume that each factor employed finds its 
best net reward. If we assume further that the production functions are such 
that average cost equals marginal cost we can rewrite the minimum cost 
principle: 

Average cost = price of product less monopoly profit and sales tax 
Summation tax on factor X factor net returns X marginal factor 


marginal product 





The summation is over factors. This equation can be rewritten as: 


Marginal product 
Summation tax on factors X factor net returns X marginal factors 





average cost 


If we wish to compare marginal products of two goods, say y and z, we may 
note that the numerator terms on the right have the same terms except for 
tax influences. The ratio of two such terms in an index of factor taxes affecting 
each product. We obtain 


average costy index of taxes on factors of y 
average cost z ™ relative to those on z. 





marginal rate of substitution = 


Hence approximately, 


average after-tax cost of y 





marginal rate of substitution = 
average after-tax cost of z 
Application of this method would not be simple, since some tracing of tax 
payments among outputs would be required. If only a few broad categories 
of final product were distinguished, the problems might not be too difficult. 
It may be noted that alJ taxes, including personal income taxes, should be 
excluded. The only case of taxes which should not be excluded from factor 
returns would be for taxes falling on monopoly and windfall profits, that is 
taxes on returns which are not properly factor returns at all. This analysis 
differs from the incidence approach suggested by OBE [13, p. 33]. 
Classification of products. The valuation procedure used in the production 
approach does not require a classification of output in the way that the con- 
sumer approach does, since duplication of outputs is no problem. Each pro- 
ductive activity is reduced to a factor valuation comprising value added by 
such activity. We may total all such value added figures in any groupings we 
please, since each use of ultimate factors has an alternative product. The most 
meaningful classification of products appears to be that of the consumer 
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approach. We could indeed create a new system using the notion of comple- 
mentarity to productive activities. There seems to be little purpose in this 
however. 

Some minor differences in classifications might be desirable. We could omit 
the savings category and list only investment activities, that is value of crea~ 
tion of new assets. Consumer output would not be value of consumption, but 
value of output for consumption. Referring to the discussion above, we may 
count A as value of output of consumption goods and B as value of output of 
addition to real assets rather than A—C and B+C respectively, where C is 
inventory liquidation. It seems awkward to speak of negative productive 
activity. Thus investment might be defined differently than the savings of the 
consumer approach. 

Offsetting values of exports and imports, with net difference representing 
foreign investment or government product, is a more meaningful computation 
from a consumer point of view than from a producer point of view. It might 
be useful to depart from this classification for producer evaluations. Thus all 
exports could be classed as a group of outputs. Imports could be neglected as 
a part of consumer and investment output. Investment and consumption 
would include values of output domestically produced only. 

Interpretation. Suppose that we have obtained factor cost, value-added 
figures for a set of products or productive activities. Suppose also that we can 
separate price and quantity components of such values. We can interpret 
these data in index number terms. Assume that a constant value function for 
period II approximates a constant output function. We determine a set II’ of 
equivalent output level such that the individual items in II’ are in the same 
ratio as those of I. The constant ratio, say 6’ of output in II’ to that in I is the 
measure of output. As explained elsewhere [17], this index 6’ based on the pro- 
ductive capabilities of period II with yardstick I is approximated by a Paasche 
quantity formula. The quantity index 8” using productive mechanism of I 
with yardstick II gives the Laspeyre quantity formula. The price formulas are 
obtained by dividing value change by the output indexes. Corresponding to 
6 is the Laspeyre price index and to 6” the Paasche price index. If we choose, 
we can consider the index # as the reciprocal of the length of time it would 
take the productive mechanism of period II to produce outputs of period I. 


4. SUMMARY OF ASSUMPTIONS AND DEFINITIONS OF THE PRODUCTION APPROACH 


Factors. Ultimate factors include all types of labor, property, skill of manage- 
ment and entrepreneurship, and capital allowances representing the value of 
other factors necessary to maintain property in its productive form. 

Factor cost. It is possible to determine for each productive activity, the cost 
of purchased materials, direct factor costs, capital allowances, monopoly returns 
or subsidy, windfall profit and taxes. Capital allowances are determined by 
eurrent factor cost required for maintaining capital. Direct factor cost and 
capital allowances determine value added factor cost for each productive ac- 
tivity. For home produced services, the amount which would be paid for similar 
type employment is used for valuation. The extent to which taxes should be 
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eliminated depends upon the degree of nonproportionality of taxes on factors 
among products. It is probably best to eliminate all taxes except taxes falling 
on monopoly and on windfall profits. 

Classification of products. Value added may be accumulated in any convenient 
form, such as by industry. Final products as defined in the consumer approach 
appear most meaningful. Departure from such classification might well be 
made for foreign products and inventory liquidations however. 

Interpretation. Assume the existence of constant product curves for each 
period approximated by a linear constant value functions. We ask what propor- 
tional increase in each of the outputs of period I could have been produced 
with the factors of period II. Alternatively we ask about the proportional 
change if factors of period I had been directed to producing products of period 
II. Familiar Paasche and Laspeyre index formulas are so derived. 


5. CONCLUSION 


The discussion above has not dealt with many of the special problems which 
arise in output measurement. Perhaps enough points have been covered to 
indicate that by clinging to the fundamental economic ideas of substitution, 
one may deal effectively with problems of definition. The notions of consumer 
opportunity cost and of a typical household were used to define final products, 
to suggest the proper means of valuing those products and to interpret the 
significance of the final measures. The notion of substitution in production was 
used to suggest a valuation of products in terms of factors costs, and to interpret 
the results in quantitative terms. It was suggested that classification of final 
products rests primarily on consumer approach criteria. To the writers’ knowl- 
edge most of the definitional and conceptual problems of output measurement 
can be handled along similar lines although arbitrary decisions such as was 
made above in the case of capital gains are required when the rationality of the . 
economic process breaks down. 

It was argued above that two measures are desirable, because government 
taxes have distorted prices sufficiently to make prices from production and 
consumption alternatives significantly different. In particular, increases in 
government activity inflate prices in a way that are not true price rises in a 
monetary sense. Also the amount of government activity in a resources sense 
may be distorted if consumer prices are used for measurement. Other good 
reasons for two measures exist that have not been discussed. One is that the 
production approach is desirable for industrial breakdown purposes, while the 
consumer approach is necessary for development of measures which seem real- 
istic because they use prices and quantities taken directly from observed 
markets. 

Some writers, notably Ohlsson [14] have stressed the multitude of uses of 
output and income data. Taxation, consumer behavior and income distribution 
studies for example require information on transactions of a varied and ex- 
tensive nature. The concept of total output is only incidental to these. For such 
studies an almanac of money flow and other information is needed. This view 
seems to the writer rather beside the point. One cannot deny the need for 
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economic almanacs. One can assert that comprehensive measurements of ac- 
complishment of economic systems are needed. The more exactly these meas- 
ures are defined, the more useful they will be. 


MATHEMATICAL NOTE 


Let (1) z=S(a,--+-, n) be @ production function with ultimate factors 
i=a,---, n. Assume partial derivatives of S exist. Let J be the after tax 
returns per unit of factor i. Let (1+r,.) I be the cost of a unit of factor to the 
producer, where rj. is a tax rate. Let X be the sales price of z, of which (1—s,)X 
is collected by the final sale of producers, let s, be a sales tax. Assume also 


X = D(z), demand curve for z as seen by individual producer. (2) 

I = f,(t), supply of 7, as seen by the individual producer. (3) 

In general small letters denote quantities, capital letters prices. Subscripts de- 
note product affected if z, y or z, or factor if 7. 


Maximum profit (1—s.)Xz— >>; (1+rs)iJ for a given output z has neces- 
sary conditions: 


aD (tra) (1 +i) 


A= (1-38, — 8,)r — 
(1 — s,)X + (1 ‘5 = 98 


8b 





i=a,---,n. (4) 


If S is homogeneous of the first degree, we can write z= ).(8S/@,)i and 
solve for \ in terms of monopoly profit per unit of output, say P,. The expres- 
sion (1) becomes 


Ltr (1 +6) 





>> (1 + Piz) vc i? 
er) 





(1 — 8,)X — P, + 
z 


We also obtain for marginal product 


(1 +r (7 +i ai 


as 
dz = Ls t= 





Ld tra) Se 
(J — 8)X — P, + 





z 


If we assume 0f/di, I, and riz are the same for a given i, when z=y and z, 
we obtain for substitutability of y and z from (6) 





NATIONAL OUTPUT MEASUREMENT 
(1 + rie)I di 
dz (1 — 8,)Z (1 — s,)(Y — P,) 
dy (l+rw di (1—s\(Z—P,) 
(1 — s,)¥ 


As a second approach, let us assume that monopsony does not exist or affects 
marginal costs and average costs equally, that after tax returns of factors are 
in equilibrium, and that average cost equals marginal cost. The last condition 
is fulfilled if production conditions are linearly homogeneous. We obtain from 


5 d (6 
(5) and (6) — Ut reli Di (1 + rieIdi 


x dz 





(7) 








X-P, (8) 


For two products we obtain for marginal rate of substitution with a given set 
of di 


dz as De (1+ Tiy)Iiy (1 + rye) Idi 
dy yd (+ra)lizd (1 + ry)ldi 





(9) 


The expression 


Dd (1 + ris)Idi 
LX (1 + rip) Idi 
can be considered an index of tax factors affecting employment in z and y. 
Thus as an approximation, we can use this to remove the tax influences in 
average costs. Thur we obtain from (9) 
de 2) It, 
dy yok, 
The average after tax factor payments becomes an approximation of the 
proper valuation of each final product under the assumptions of equal monop- 


sony affects, equilibrium returns to factors, and production conditions linear 
and homogeneous. 





(10) 
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NOTES ON IMMIGRATION STATISTICS OF THE UNITED STATES* 


E. P. Hurcuinson 
University of Pennsylvania 


The continuous record of immigration to the United States, initiated 
by the Act of March 2, 1819 and beginning with the fiscal year ending 
September 30, 1820, provides one of our longest annual series of official 
data. Like other long series, however, it has undergone considerable 
change in scope and in the basis of reporting since its inception more 
than a century and a quarter ago. A summary of changes in the 
immigration series is given in'each annual volume of the Statistical 
Abstract of the United States and, in more detail, in the supplementary 
volume, Historical Statistics. The present account gives fuller reference 
to the published sources, indicates in more detail the changes that have 
taken place in the basis of reporting immigration, and assembles notes 
for the evaluation of the series. 


Part I. Sources 


. Official sources 


1. Department of State series, 1820 to 1870 or 1874 

2. Bureau of Statistics series, Treasury Department, 1867 to 1895 
3. Bureau of Immigration or INS Series, 1892 to date 

4. Secondary sources 


. Other sources 


Part ZI. The Estimates and Official Statistics of Immigration 


. Immigration before 1819 
. The Department of State series, 1820-1867 
. Aliens or foreign born 
. Arrival or embarkation 
. Temporary visitors and aliens in transit 
. Fiscal year 
. Nativity or nationality 
. Land border arrivals 
. Evaluations 
. The Bureau of Statistics series, 1868-1891 
1. Comparison with Department of State series 
2. Basis of reporting immigration 
3. Country of origin 
4. Land border arrivals 
5. Reentry of aliens 
. The Bureau of Immigration or INS Series, 1892 to date 
1. Comparison with Bureau of Statistics series 
2. Cabin and steerage passengers 
3. Land border arrivals 
4. Arrivals or admissions 





* This paper was prepared to supplement statistics on immigration in the forthcoming new edition of His- 
torical Stutistics of the United States compiled under the joint sponsorship of the Bureau of the Census and the Social 
Science Research Council. 

The cuthor is indebted to Ernest Rubin of American University and Helen Eckerson of the Immigration and 
Naturalization Service for information on many obscure points of the immigration statistics, and to Niles Carpenter 
of the University of Buffalo for careful reading of the manuscript and a number of helpful suggestions. 

Permission of author and publisher to quote from Willcox, International Migrations, vol. II, National Bureau 
of Economic Research, is acknowledged. 
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. Aliens in transit 
. Reentry of aliens 
. Continental United States and Insular Possessions 
. Redefinition of immigrants 
. Illegal entry 
. Summary 
Appendizes 
. Detailed Notes on the Immigration Statistics for single years, (1820-1910) 
. List of Publications containing the Original Reports of Immigration Statistics, 
1820-1891 
. Manifest Requirements in Immigration Acts 
Act of June 25, 1798 
Act of March 2, 1819 
Act of March 3, 1855 
Act of May 6, 1882 
Act of August 2, 1882 
Act of September 13, 1888 
Act of March 3, 1891 
Act of March 3, 1893 
Act of March 3, 1903 
Act of February 20, 1907 
Act of February 5, 1917 
Act of July 30, 1947 
Act of June 27, 1952 


PART I. SOURCES 


A. Official sources 


HE Act of June 25, 1798 (1 Stat. 570), which was enacted for a period of 
7. a two years, contained in Section 3 the requirement that alien passengers 
on vessels arriving in the United States be reported to the collector or other 
chief officer of customs at the port of entry.! The continuous reporting of 
immigration to the United States began, however, with the 1819 Act (3 Stat. 
489), which required that passenger lists or manifests of all arriving vessels be 
delivered to the local collector of customs, copies transmitted to the Secretary 
of State, and the information reported to Congress. Three periods can be dis- 
tinguished in the subsequent reporting of immigration, from 1820 to 1870 or 
1874 when the information was compiled by the Department of State, from 
from 1867 to 1895 by the Bureau of Statistics of the Treasury Department, and 
from 1892 onward by a separate Office or Bureau of Immigration, now the Im- 
migration and Naturalization Service.? 

1. Department of State series, 1820 to 1870 or 1874.—In compliance with the 
Act of 1819, reaffirmed by the Act of 1855, the Secretary of State made annual 
report to Congress. These reports begin with the fiscal year October 1, 1819 
to September 30, 1820. The series is stated to have continued to 1874, but the 
last report found in Congressional documents is for the year ending December 
31, 1870. It provides the present official statistics from 1820 to 1867 inclusive. 
See Appendix B for list of the Department of State reports to Congress. 

1 See Appendix C for the text of the reporting requirements contained in this and later acts. 
2 Several state acts called for passenger lists of incoming vessels, including New York acts of 1788 and 1824, 


Massachusetts acts of 1794 and 1820, and a Maryland act of 1833. See Edith Abbot, Immigration, Select D 
and Case Records, Chicago: University of Chicago Press, 1924, pp. 104-109. 
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2. Bureau of Statistics series, Treasury Department, 1867 to 1895.—It provides 
the present official statistics from 1868 to 1891 inclusive. The immigration 
statistics compiled by the Bureau of Statistics were published currently in the 
following serial publications: 


Monthly Report of the Director of the Bureau of Statistics, issued in a series of 29 
consecutively numbered reports from November 15, 1866 through the fiscal year 
ended June 30, 1869. Quarterly statements on immigration give passenger arrivals 
by port, classified as immigrants and nonimmigrants, and for most quarters by 
nationality, age group, and occupation. See Appendix B for list of quarterly reports 
and variation of title. 


Monthly Reports on the Commerce and Navigation of the United States, by the Chief of 
the Bureau of Statistics, issued for fiscal years ended June 30, 1870 to 1875 inclusive. 
Quarterly statements on immigration give passenger arrivals, with some variation, 
by class (cabin, not cabin), immigrant and non-immigrant, sex, age group, nationality, 
and deaths on voyage; and annual summaries give some additional information in- 
cluding occupational distribution. See Appendix B for list of quarterly reports. 
Replaced by Summary Statement, etc. fiscal year 1876 onward (see below). 


Quarterly Report of the Chief of the Bureau of Statistics, Treasury Department, Relative 
to the Imports, Exports, Immigration, and Navigation of the United Siates, fiscal years 
1876 to 1893 inclusive. Quarterly reports of passenger arrivals and immigrants by 
port, sex, age group, country, and occupation; deaths on voyage; and annual summary 
including recapitulation from 1820 to date. 


Summary Statement of the Imports and Exports of the United States, issued by the 
Bureau of Statistics, fiscal years 1876 to 1893 inclusive. No immigration statistics 
given fiscal years 1876 to 1889 inclusive; monthly statement of immigration by port 
and nationality and summary for year to date, No. 1 (July), 1889/90 to No. 12 
(June), 1892/93. (Replaced by the Bureau of Statistics, Treasury Department 
publication, Monthly Summary of the Imports and Exports of the United States, New 
Series, followed by Monthly Summary of Commerce and Finance of the United States, 
fiscal years 1894 to 1903 inclusive.) 


The following summary reports and reprintings, incorporating the earlier 
Department of State material, were also issued by the Bureau of Statistics: 


Special Report on Immigration, by Edward Young, Washington, Government Printing 
Office, 1872. A summary of immigration statistics from 1820, with comment by the 
Chief of the Bureau of Statistics. 


“ables showing Arrivals of Alien Passengers and Immigrants in the United States from 
1820 to 1888, Washington, Government Printing Office, 1889. Introductory notes by 
the Chief of the Bureau, with about 120 pages of tables. 


Arrivals of Alien Passengers and Immigrants in the United States from 1820 to 1890, 
Washington, Government Printing Office, 1891. Introductory section with text and 
tables, then over one hundred pages of immigration data. Also 1893 edition with data 
to 1892. From Quarterly Reports. 


Immigration and Passenger Movement ai Ports of the United States, 1892/93 to 1894/96, 
Washington, Government Printing Office, 1894-96. Excerpts from the annual reports 
of the Bureau of Statistics. 


Immigration into the United States, showing Number, Nationality, Sex, Age, Occupation, 
etc., from 1820 to 1908, Washington, 1903. From the Monthly Summary of Commerce 
and Finance, June, 1903. 
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8. Bureau of Immigration or INS Series, 1892 to date.—Since 1892 there has 
been an Office or Bureau of Immigration, which became the Bureau of Immi- 
gration and Naturalization in 1912 and the Immigration and Naturalization 
Service in 1932. Beginning with the fiscal year 1892, the Bureau of Immigration 
and its successors have been responsible for the compilation of immigration 
statistics, which have been published in the Annual Reports.’ For the fiscal 
year 1933 to 1940 inclusive the work of the Immigration and Naturalization 
Service was summarized in each Annual Report of the Secretary of Labor, and 
for the fiscal year 1941 in the Annual Report of the Attorney General. No report 
was published in 1942. The Annual Report of the Immigration and Naturalization 
Service was published in mimeographed form for the fiscal years 1943 to 1954 
inclusive, and in printed form from 1955 to date. 

4. Secondary sources.—Since the Annual Reports are not widely available, 
the various secondary sources of immigration data are useful. Brief summaries 
of immigration statistics are published in the Statistical Abstract and in His- 
torical Statistics, and compilations of immigration data are contained in official 
studies and reports on immigration. The following are especially useful as 
sources of the older immigration material. 


1901. E. Dana Durand, “General Statistics of Immigration and Foreign-born 
Population” in Vol. XV, Part II, pp. 259-291, Reports of the Industrial Commission, 
Washington, Government Printing Office, 1901. 


1903. Monthly Summary of Commerce and Finance of the United States, Series 1902- 
1903, No. 12, June, 1903 (57th Congress, 2d session, House Document No. 15, Part 
12), Immigration into the United States, from 1820 to 1903, pp. 4333-4444. 


1911. Reports of the Immigration Commission, Vol. I (61st Congress, 3d session, 
Senate Document No. 747), pp. 51-118, Abstract of the statistical review of immigra- 
tion to the United States, 1820 to 1910; and Vol. III (61st Congress, 3d session, Senate 
Document No. 756), Statistical Review of Immigration, 1820-1910. 


B. Other sources 


In addition to the above official sources there are a number of non-official 
reports and studies on immigration. Listed below in chronological order are 
publications especially important for their reporting of the older series or for 
their analysis and evaluation of the officially reported statistics of immigration. 


Samuel Blodget, Economica, a Statistical Manual for the United States, Washington, 
printed for the author, 1806. 

This contains, pp. 57-58, estimates of annual immigration for selected years, 1774 
to 1806, together with comment on the contribution of immigration to the growth of 
the population of the United States. 


Adam Seybert, Statistical Annals of the United States of America, Philadelphia, 
Dodson, 1818. 
Section II, “Of the Emigration from Foreign Countries,” pp. 28-30, discusses post- 





3 Annual Report of the Superintendent of Immigration (or Commissioner General of Immigration), fiscal years 
1892 to 1932 inclusive, thereafter Annual Report of the Immigration and Naturalization Service. The report was to 
the Secretary of the Treasury for the fiscal years 1892 to 1903 inclusive, to the Secretary of Commerce and Labor 
for the fiscal years 1904 to 1912 inclusive, to the Secretary of Labor for the fiscal years 1913 to 1940 inclusive, and 
thereafter to the Attorney General, Department of Justice. Also published in the Annual Reports of the Secretary of 
Commerce and Labor, 1904 to 1912 inclusive. 
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colonial immigration and its contribution to population growth, and gives a tabulation 
of reported passenger arrivals in 1817 at selected ports, according to country of origin. 


George Tucker, Progress of the United States in Population and Wealth, New York 
Hunt’s Merchants’ Magazine, 1843. 

Chapter 10, “Emigration,” quotes and discusses estimates of immigration before 
1819, and gives critical comment on the official immigration statistics. 


Jesse Chickering, Immigration into the United States, Boston, Little and Brown, 1848. 

This contains an estimate of the population of foreign origin (immigrants and their 
descendants) at each census year from 1800 to 1840 inclusive (pp. 21-26). An appendix 
gives excerpts concerning immigration from Niles’ Weekly Register, 1815-1820, 
(pp. 67-84). 


J. D. B. DeBow, The Industrial Resources, Statistics, etc., of the United Siates, 3d 
edition, 3d vol., New York, Appleton, 1854. 

Vol. 3, pp. 395-400, 424-425, contains a discussion of immigration by J. B. Auld, 
cites estimates of immigration before 1819, and estimates the number of immigrants 
since 1790 and their descendants in 1850. Further notes be DeBow are given in pp. 
119-124 of Statistical View of the United States, a Compendium of the Seventh Census, 
Washington, Nicholson, 1854. 


William J. Brownell, History of Immigration to the United States, New York, Redfield, 
1856. 

This contains introductory notes on immigration statistics and on immigration 
before 1819, together with tables for each fiscal year, 1820 to 1855 inclusive, showing 
the number of passenger arrivals according to sex, port of entry, age group, occupa- 
tion, and “country where born.” The best single source of the early immigration 
statistics. 


Frederick Kapp, Immigration and the Commissioners of Emigration of the State of 
New York, New York, Nation Press, 1870. 

The author, a New York Commissioner of Emigration, cites estimates of immigra- 
tion before 1819 and gives valuable notes on later immigration, especially that to 
New York State (Ch. 1). Accounts are also given of conditions of trans-Atlantic 
travel and of mortality at sea (Ch. 2 and appendix). The appendix tables show the 
number of alien passengers by single years, 1820 to 1850, the occupations of passengers 
and the country of birth by decade, and more detailed information for the most recent 
years and for the port of New York. 


Edward Jarvis, “Immigration,” Atlantic Monthly, April, 1872, 29: 454-468. 

This analyses statistical information concerning the contribution of immigration 
to the growth of population in the United States, and includes discussion of the im- 
migration statistics. 


Edward Jarvis, History of the Progress of Population of the United States from 1790 
to 1870, Boston, Clapp, 1877, 16 pp. 

An estimate of immigration to the United States from 1790 to 1870 by decade is 
included. 


Walter F. Wilcox, editor, International Migrations, Vol. II, Interpretations, New York, 
National Bureau of Economic Research, 1931. 

Chapter II, “Immigration into the United States,” by Willcox is a summary of 
information on immigration and the foreign-born population of the United States. 
Appendix II, “Critique of Official United States Immigration Statistics,” by Marian 
Rubins Davis is the most detailed account of the changing composition of the official 
immigration series. 


Simon Kuznets and Ernest Rubin, Immigration and the Foreign Born, Occasional 
Paper 46, National Bureau of Economic Research, 1954. 
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Pages 55-64 deal with the composition and interpretation of immigration statistics 
from 1870 onward. Appendix A gives a comparative account of estimates of net im- 
migration by decade, 1820--1940. Appendix B includes estimates of alien passenger 
arrivals and departures for each fiscal year, 1870-1945. 


Ernest Rubin, Immigration and the Economic Growth of the United States, 1790-1914. 
National Bureau of Economic Research, Conference on Research in Income and 
Wealth, September 4-5, 1957. 

The first section of this paper contains a valuable critique of the estimates of early 
immigration end of the official immigration statistics. 


PART II. THE ESTIMATES AND OFFICIAL STATISTICS OF IMMIGRATION 


Assembled below are notes on early immigration and the later official im- 
migration series, with particular reference to the changing composition of the 
reported data and to the evaluation of the material. 


A. Immigration before 1819 


For the period from the Revolutionary War to 1819 there are scattered 
reports of passenger arrivals at some ports;* but little is known of the total 
volume of immigration during this period, and only estimates are available. 
Blodget, who wrote in 1806 and could present contemporary opinion, gave the 
following estimates of immigration in selected years.§ 


It was his opinion that immigration had not added greatly to the population of 
the United States. As he wrote: 


For the present rapid increase of population the United States are less indebted to 
foreign emigration than was formerly believed, if reliance may be placed on the best 
records and estimates at present attainable; by these they have not averaged more 
than 4,000 for the last ten years, while it is known that about half that number have 
migrated from the United States, a part to Upper Canada, and more as seafaring 
adventurers, to every port of the globe.* 


Seybert, writing a little more than a decade later and at a time when im- 
migration was believed to be greater than formerly cited Blodget’s earlier 
estimates but gave a generally higher appraisal of the volume of immigration 
and presented different estimates for certain years, as follows. 

It is believed that the population of the United States has been much augmented 
by the emigrants from Europe: there are no authentic documents on this subject, 


and we can only estimate the increase we have thus acquired. The emigrants came 
principally from Great Britain, Ireland, and Germany; but few have, hitherto, 





« See for example Niles’ Weekly Register, Baltimore, 1811-1849. 

5 Samuel Blodget, Economica, a Statistical Manual for the United States (Washington: printed for the author, 
1806), p. 58. 

* Tbid., p. 57. 
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arrived from other countries. In 1794, the emigrants, who arrived during that year, 
were estimated at ten thousand; and in 1806, Mr. Blodget said, from ‘the best records 
and estimates at present attainable,’ the emigrants who arrived in the United States 
did not average more than 4,000 per annum, for the ten preceding years. In 1794, 
the people in Great Britain were very much disposed to come to the United States; 
but this current was soon checked, by the acts of the British government. Though 
we admit, that ten thousand foreigners may have arrived in the United States in 
1794; we cannot allow that they did so, in an equal number, in any preceding or 
subsequent year, until 1817.7 


According to data presented by Seybert, arrivals during 1817 of passengers 
from foreign countries at ten ports of the United States amounted to 22,240, 
of whom slightly more than half were from Great Britain and Ireland.* For 
the twenty-year period, 1790 to 1810, Seybert estimated an average annual 
immigration of 6,000 or a total of 120,000 between those years. 

In 1843 Tucker referred to this estimate of Seybert’s and noted that no data 
were available for the following decade, 1810 to 1820, except Seybert’s figure 
of 22,240 for the single year 1817; and as it has been seen this was for ten prin- 
cipal ports only. On this scanty material Tucker estimated immigration during 
the decade before 1820 at 114,000, or a total of 234,000 since 1790.° 

For want of better information these estimates were quoted by later writers, 
such as DeBow, Bromweli, and Kapp. The latter, however, questioned Blod- 
get’s estimate of 4,000 per annum for the years 1778-1794 and Seybert’s of 
6,000 per annum between 1790 and 1810 as being too high. He preferred esti- 
mates of 3,000 and 4,000 per annum for the two periods, respectively.'® 

The final estimate of immigration between the Revolutionary War and the 
beginning of the official series apparently comes from Bromwell of the State 
Department and Young, Chief of the Bureau of Statistics. After reviewing the 
earlier estimates, Bromwell gave the round number of 250,000 immigrants be- 
tween the close of the war and October, 1819, but he gave no explanation of 
how he arrived at this figure." Young, writing in 1872, estimated that 225,000 
immigrants arrived between 1790 and 1820, and that an additional 25,000 came 
between 1776 and 1790." His estimate thus agrees with that of Bromwell, but 
the basis of the estimate is not known in either case, and the total of 250,000 
is close to but not identical with the earlier estimates. It is, however, the now 
accepted figure. It was cited in a discussion of immigration in an 1880 census 
report,’® and has reappeared in a number of later official accounts of immigra- 
tion with the assurance that it is from “good authorities.” In quoting the esti- 
mate of 250,000 the 1880 census report attached the accurate comment that 
“the immigration into the United States from the close of the Revolutionary 





7 Adam Seybert, Statistical Annals of the United States of America, (Philadelphia: Dodson, 1818), p. 28. 

8 Ibid., p. 29. The ports were Boston, New York, Perth Amboy, Philadelphia, Wilmington in Delaware, Balti- 
more, Norfolk, Charleston, Savannah, and New Orleans; and arrivals were shown separately for Great Britain 
Ireland, Germany ard Holland, France, Italy, British North America, and the West Indies. In the fiscal year 
1820 about 95 per cent of the recorded arrivals were at these ports. 

* George Tucker, Progress of the United States in Population and Wealth (New York: Hunt’s Merchants’ 
Magazine, 1843), p. 80. 

1° Frederick Kapp, Immigration and the Commissioners of Emigration of the State of New York (New York: 
Nation Press, 1870), p. 12. 

1 Wiliam J. Bromwell, History of Immigration to the United States (New York: Redfield, 1856), p. 13. 

12 Edward Young, Special Report on Immigration (Washington: Government Printing Office, 1872), p. v. 

4 U.S. Tenth Census, 1880, Vol. I, p. 458. 











970 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1958 


War to the year 1820 can hardly be estimated; it can only be conjectured from 
very few and partial data.“ A present-day student of the subject, after taking 
into account the illegal importation of slaves after the slave trade was pro- 
hibited, together with migration from Canada and other unmeasurable factors, 
concludes that a figure twice as high as the officially quoted estimate of 250,000 
is more reasonable." 


B. The Department of State Series, 1820-1867 


The official series of immigration statistics as it appears in the Annual Reports 
and other publications uses the Department of State data for the fiscal years 
1820 to 1867 inclusive. The Act of 1819 under which the passenger lists or 
manifests were obtained called for a list of “all the passengers taken on board 
... at any foreign port or place,” the age, sex, and occupation of the passen- 
gers, the “country to which they severally belong,” the country “of which it is 
their intention to become inhabitants,” and the number of deaths on the 
voyage. This wording was repeated in the Act of 1855. The passenger lists if 
properly filled out thus would have provided the information necessary to dis- 
tinguish immigrants from the other passengers. It is not known for certain, 
however, what passengers were actually included in the immigration statistics; 
and there is, of course, no assurance that uniform practice in the compilation of 
the data was followed at all the ports, especiai.y in the earlier years. 

1. Aliens or foreign born.—The reported totals were indicated to be of alien 
passenger arrivals by sea from foreign countries (Bromwell). The tables given 
by Bromwell for the first fiscal year, for example, show the following: 

Total passengers arriving by sea 
Born in the United States 


It is the last figure, 8,385, that is reported in the Department of State series. 
A first point to be noted, therefore, is that exclusion from the reported total 
was apparently on the basis of nativity or birthplace rather than citizenship. 
In other words, naturalized citizens of the United States were perhaps not 
deducted, and the report, therefore, may be cf foreign-born passengers rather 
than alien passengers.'* 

2. Arrival or embarkation.—From evidence that is mentioned later,’ it ap- 
pears that the reporting was not of passenger arrivals'* but, in the words of the 
Act, “passengers taken on board,” without deduction of deaths during the 
voyage. The number of deaths was presumably not large as a rule,'® but was 
doubtless greater in the earlier days of sailing vessels and longer voyages; and 
it rose in years of famine and epidemics. According to Kapp (1870): 





4 Ibid. 

44 Ernest Rubin, Immigration and the Economic Growth of the United States: 1790-1914, National Bureau of 
Economic Research, Conference on Research in Income and Wealth, September 4-5, 1957, p. 3. 

% According to Davis, International Migrations, Vol. II, p. 647, persons “born in or belonging to the United 
States” were presumably excluded from the statistics; but evidence of exclusion strictly on the basis of citizenship 
has not been found. 

17 See notes for the years 1856-1867 in Section 3 below. 

18 Both the original Department of State reports and Bromwell’s revision give the data as arrivals. 

19 The only mortality data that have been found for this period are for the years 1856-1865, and are given below 
in Section 3. 
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A hundred, and even fifty, years ago, a sea voyage was an enterprise requiring 
more than ordinary courage. A person crossing the Atlantic, regularly made his last 
will and provided for his family. 

Ten deaths among one hundred passengers was nothing extraordinary; twenty 
per cent, was not unheard of; and there were cases of 400 out of 1,200 passengers 
being buried before the ships left port. Other facts of the same kind are on record. 
Thus, of the 3,000 Palatines forwarded in 1710 by the English Government to 
New York, 470 died on the voyage, and 250 immediately after their arrival, of ship- 
fever. ... 

To give an adequate idea of recent losses of human life on board of ill-provided, 
ill-ventilated vessels, it may be stated here that out of 98,105 poor Irish emigrants 
shipped to Canada by their landlords after the great famine of 1846, during the sum- 
mer of 1847 there died 5,293 at sea, 8,072 at Gross Isle (Quarantine) and Quebec, 
and 7,000 in and above Montreal, making 20,365, besides those who afterwards 
perished whose number will never be ascertained.*° 


Young, in summarizing the immigration statistics for the period 1820 to 1867, 
stated that deaths had not been subtracted.” It appears, however, that actual 
arrivals were reported in 1859, 1860, and 1864.” 

Consideration of deaths during the voyage raises the interesting but numeri- 
cally minor question of whether births during the voyage were added to the 
passenger lists and included in the statistics. The only data that have been 
found on births at sea are for the fiscal years 1871 to 1891 inclusive, when they 
averaged only about 60 a year and were added to the immigrant total. From 
internal evidence, however, it is suspected that the reporting of births even in 
these years was defective. 

8. Temporary visitors and aliens in transit.—If, as the Acts of 1819 and 1855 
directed, arriving aliens were asked whether it was their intention to become 
residents of the United States, it was possible to distinguish immigrants from 
aliens not intending to stay. Throughout the Department of State series, how- 
ever, the total was reported as arriving alien passengers, and until 1856 the 
reports did not distinguish temporary arrivals.% Estimates of the proportion of 
temporary arrivals before 1856 are from 13 per cent* to 2 per cent.” 

The Department of State reports to Congress gave variable detail in the 
classification of passenger arrivals from 1856 to 1867 inclusive, and especially 
for the war years provided little or no summary information. The reported 
totals are given here, and are for calendar years except as noted. 

Somewhat different data for these years are found in Volume I of the 1880 
census reports and in summaries published by the Bureau of Statistics. The 
census tabulation of immigration data gave a total entitled “aggregate aliens” 
for the years 1856 to 1867 inclusive, which totals agree with the present of- 
ficial series for these years and correspond to the alien passenger totals of the 





20 Kapp, op. cit., Chapter II, pp. 20-21, 23. 

% Young, op. cit., Table 2, footnote. 

22 See detailed notes for these years in Appendix A. Compare also with data in section 3. 

® See detailed notes for the years 1871 to 1891. 

™ According to the Quarterly Report of the Chief of the Bureau of Statistics, 1892-93, No. 2, p. 391, “Prior to 
1856 the official statistics of arrivals of passengers from foreign countries do not distinguish those intended to make 
their permanent residence in this country from merely transient passengers or sojourners, but there were during 
that time comparatively few of the latter.” 

% Young, op. cit., Table 2, footnote. 

% United States Bureau of Statistics, Arrivals of Alien Passengers, etc., General Review. 
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P Passengers 
Year wi i Born in U. 8. Aliens meaning to 
—r reside in U. 8.' 





1856 224 ,496 24 ,060 200 ,436 200 ,002 
1857 271,982 20 ,676 251,306 243 ,562 
1858 144 , 906 21,780 123 , 126 140,611 
1859 155 , 302 34 ,227 121 ,282> 150 ,824> 
1860 179,691 26 ,051 153 ,640 173 ,491 
1861 112,705 — — 
1862 114,475 — —_ 
1863 199,811 — — 
1864 221,535 — — 
1865 287 ,397 38,794 248 ,603 
1860° 286 ,496 29,418 257 ,078 
18674 372,461 40,159 332 ,302 327 , 383° 




















* Deaths excluded from the totals for 1859, 1860, and 1864; the treatment of deaths in the other years not 
stated. 

> Deaths included in this total. 

® For 9 months to September 30. 

4 For 12 months to September 30. 

* Aliens only. 

f Not stated: 20,315 in 1856; 24,483 in 1857; 923 in 1858. 


Department of State series for some but not all years. Aliens not intending to 


reside in the United States and deaths on voyage were then subtracted to give 
immigration totals comparable to the Bureau of Statistics immigration series. 
The revision from alien passengers or “aggregate aliens” to immigrants reduced 
the totals on the average by about 1} per cent. (See 1880 Census series.) 

From time to time the Bureau of Statistics published a summary of the im- 
migration statistics including this 12-year period, in a form consistent with its 


FROM 1880 CENSUS 











Aliens not 
intending to 
reside in U.S. 


Died on Total 
Voyage immigrants 


Aggregate 


vo aliens 





1856 200 ,436 4,179 195,857 
1857 251,306 3,937 246 ,945 
1858 123,126 3,371 119,501 
1859 121,282 2,459 118,616 
1860 153,640 3,181 150 , 237 
1861 91,920" 2,098 89,724 
1862 91, 987° 2,819 89 ,007 
1863 176,282 1,690 174,524 
1864 193,418 221 193,195 
1865 248,120 658 247 ,453 
1866 318,568 3,651 314,917 
1867 315,722 4,757 310,965 

















* Official series now gives 91,918. 
> Official series now gives 91,985. 
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own data and with the same numbers of immigrants for the separate years as in 
the 1880 Census tabulation.”” 

A little information on aliens in transit was given for several of the last years 
of the Department of State series. For the last four years of the Department 
of State series, 1867 to 1870 inclusive, the following information was given for 
arriving alien passengers. The periods of reporting, it should be noted, do not 
correspond to the fiscal years of the official series, nor do these Department of 
State data agree with the Bureau of Statistics reports for the same years. 




















Alien Intend to z 
Passengers Remain In Transit 
Year ended Sept. 30, 1867 332,302 327 ,383 3,762 
Calendar year 1868 161,167 
Calendar year 1869 reget meesldrand 4,419 
Calendar year 1870 284,815 280 ,278 1,664 
1,123 ,937 1,059 ,083 9,845 





The 55,009 alien passengers unaccounted for in the last two columns were pre- 
sumably temporary visitors. 

4. Fiscal year.—The official series reports alien passengers according to 
fiscal year, and the period of reporting changed several times during the years 
spanned by the Department of State series. From 1820 to 1831 inclusive the 
fiscal years ended September 30. A change was then made to a fiscal year coin- 
ciding with the calendar year; and the reported immigration total for 1832 
covered the 15-month period from October 1, 1831 to December 31, 1832. The 
statistics for 1833 to 1842 inclusive are for years ending December 31. The 
fiscal year ending September 30 was readopted in 1843, and the immigration 
total for that year covered only the 9 months from January 1 to September 30. 
Fiscal years ending on September 30 continued from 1844 to 1849 inclusive. 
Return to the calendar year basis of reporting in 1850 lengthened the period 
covered by the report for that year to the 15 months from October 1, 1849 
to December 31, 1850. The calendar year basis of reporting continued thereafter 
to 1867 inclusive. 

5. Nativity or nationality —The Act of 1819 specified that the passenger 
lists include information on “the country to which [the passengers] belong.” 
If unambiguous at the time, it is not now clear whether this is to be interpreted 
to mean country of citizenship, country of last permanent residence, or country 
of birth. Bromwell, who was in the Department of State and whose report of 
the Department of State immigration statistics is presumably authoritative, 
gives the information as “country where born.” Later, tables published by the 
Bureau of Statistics in summary of the Department of State immigration data 
are entitled “Arrivals, by nationality,” but indicate in the body of the table 
that the classification is according to country of last permanent residence. The 





% See Quarterly Report of the Bureau of Statistics, 1888-89, No. 3, p. 825 or any other summary issued by the 
Bureau. The number of returning citizens was given for each year in addition to the number of alien passengers. 
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latter was used by the Bureau of Statistics in its own immigration material. 
Examination of original passenger lists might shed light on the basis of classi- 
fication by country used in the Department of State series, but would not 
necessarily show what praciice was followed by the ships’ officers and others 
who prepared the passenger lists, 

6. Land border arrivals—The Acts of 1819 and 1855 concerned only arrivals 
by vessel, and no provision was made in these or other acts of the period for 
the recording of arrivals over the land borders. Such arrivals would have 
included natives of Canada and Mexico, together with aliens who proceeded 
to the United States by way of these adjoining countries. Arrivals from Mexico, 
especially of the latter type, are thought to have been few in number during 
the years included in the Department of State series; but arrivals from Canada 
of natives of that country and of Europeans who came via Canadian ports 
were believed to be numerous.”® Several estimates of the volume of immigration 
from Canada are noted below. 

Bromwell’s statistics of passenger arrivals according to country of birth show 
considerable numbers of arrivals of natives of Canada and Mexico. In the 
period covered by his material, from October 1, 1819 to December 31, 1855, the 
reported number of natives of British America or Canada arriving in the United 
States was 91,699, and of natives of Mexico, 15,969. These he stated to be 
arrivals by sea, but the classification of arrivals by port or place of entry in- 
dicates that not only seaports were included. The returns for the fiscal year 
1820 included 14 arrivals at the lake port of Sandusky, Ohio. Oswegatchie, an 
inland town in upper New York State, appears in the report for 1821 and sev- 
eral following years. 

In 1853 the Department of State began efforts to secure greater uniformity in 
the returns submitted by the collectors of customs, for it was recognized that 
the immigration reports of the Department “could lay no claim to that accuracy 
and comprehensiveness of detail contemplated by the act in accordance with 
which they were made.” A circular was sent to collectors, a schedule was pro- 
vided for their guidance in reporting passenger arrivals, and in the letter from 
the Secretary of State it was stated that “although the Act of 1819 seemed to 
have reference only to passengers arriving by sea, the attention of collectors at 
frontier custom-houses, especially on the northern border, should be directed 
to immigrants entering the country by land.” The effort to secure reports of 
land border arrivals was continued in the following years, an amendment to 
the 1819 act was suggested but was not adopted, and the Secretary of State 
continued to refer to the matter in the letters accompanying his annual reports 
of immigration statistics to Congress.?* 

According to a later account from the Bureau of Statistics, in 1889: 


Neither the act of March 2, 1819, nor any subsequent act provided means for 
collecting statistics of immigrants arriving otherwise than in vessels, and no attempt 





% Concerning the well-travelled route to the United States via Canada, see for example Marcus Hansen, The 
Atlantic Migration (Cambridge: Harvard University Press, 1941), pp. 254~255; and Oscar Handlip, Boston's Im- 
migrants (Cambridge: Harvard University Press, 1941), p. 54. 

% See letters of the Secretary of State accompanying his reports to Congress for the years 1853 to 1856. 
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was made to ascertain the number of immigrants crossing the frontier into the 
United States either from Mexico or Canada until 1860, when, by direction of the 
Secretary of the Treasury, the collectors of customs at Detroit and Port Huron made 
returns to the Department of State of immigrants arriving at those ports from and 
through Canada, by railway trains and ferries, so far as they could learn the number, 
nationality, etc., from inspection and inquiry of the passengers.*° 


According to the same account, “No statistics of immigrants arriving by land- 
carriage from Mexico have ever been collected.” Actually, the effort to include 
land border arrivals began in 1853, and returns for Oswego, New York, began 
in 1855. (See detailed notes, Appendix A.) Thereafter land border or inland 
arrivals were reported on a somewhat irregular and admittedly incomplete 
basis. Some inland places failed to make returns for every quarter, some re- 
ported total passengers, others counted only foreign passengers, and some gave 
only the number of foreign passengers who intended to reside in the United 
States. The recording of land border arrivals, furthermore, was largely aban- 
doned during the war years in the 1860’s, but was resumed and extended to more 
places in 1865 and after. In practice it was limited to the Canadian border. 

The attempt to collect statistics of arrivals from Canada was finally aban- 
doned in 1885, as will be more fully described below in another section. It was 
the later comment of the Bureau of Statistics on the collection of statistics of 
immigrant arrivals from adjacent countries that “as there was no law providing 
for the collection of these statistics, their accuracy depended entirely upon the 
vigilance of collectors of customs on the frontier,” and the incompleteness of 
the data was well known. 

Several estimates were made of the volume of immigration from and through 
Canada before 1860. Chickering, as cited later by Auld, estimated that 67,993 
immigrants from England, Scotland, and Ireland came to the United States 
through Canada in the decade of the 1820’s, and that an additional 199,130 
from the same source and by the same route came in the next decade.* It was 
his opinion that the reported immigration totals should be increased by 50 
per cent to allow for arrivals from Canada.” 

An estimate of the uncounted immigration of Canadians and of Europeans 
coming by way of Canada between 1815 and 1860 was made by Jarvis. The 
remigration of Europeans from Canada to the United States he estimated from 
the reduction of the number of Europeans enumerated in successive Canadian 
censuses, after allowance for mortality and immigration. The migration of 
Canadians or “Provincials” to the United States he derived from the number of 
Canadians enumerated in the United States in 1850 and 1860, with an upward 
adjustment to allow for deaths after immigration. According to his summary: 

Thus those natives of Europe and the British Provinces who came from and 


through Canada and New Brunswick unnoticed by the American officers and not 
included in the immigration reports were: 





- 


% Tables Showing Arrivals, etc., introductory statement by the Chief of the Bureau of Statistics. 

% J. D. B. DeBow, The Industrial Resources, Statistics, etc., of the United States (3d ed., 3d Vol.; New York: 
Appleton, 1854), p. 424. 

# Tbid., p. 396. 
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In periods Provincials Europeans Totals 





1815 to 1820 -_ 12,157 12,157 
1820 to 1830 5,325 26 , 524 31,849 
1830 to 1840 26 ,623 56 ,364 82 ,987 
1840 to 1850 85,576 90,718 176 ,294 
1850 to 1860 82,787 9,053 91,840 











200 ,311 194,816 395 ,127 





These should be added, in their respective periods, to the numbers of immigrants 
given in government reports.* 


To counterbalance the uncounted arrivals of aliens by way of Canada and 
Mexico there were, no doubt, other aliens destined for those countries who came 
by way of the United States and were, therefore, counted among the alien 
passenger arrivals. Tucker, in a critique of the immigration statistics, assumed 
that one-third of the British migrants who arrived at New York in the 1830’s 
went on to Canada, but considered that this was a liberal estimate of the pro- 
portion going to Canada.™ Auld, in comment on the immigration statistics for 
the period 1820 to 1840, made the convenient assumption that the departures 
to Canada were balanced by errors of omission in the immigration data. In his 


words: 


During the same time a considerable number are supposed to have landed in 
New York with the purpose of pursuing their route to Canada; but it is probable 
that the number of these was balanced by the omissions in the official returns.* 


No doubt the official returns were incomplete, but we have little basis for 
estimating either the number of omissions at this time or the number of aliens 
who entered in transit through the United States. 

7. Evaluations.—The preceding notes indicate some questions about the 
Department of State data, and give information to aid in the interpretation of 
the published series. There is also the question of the quality of the data or the 
care with which they were compiled. Here the comments of contemporary or 
nearly contemporary observers are especially valuable; and there are some fur- 
ther comments by more recent investigators of the immigration statistics. In- 
formation with respect to specific years is given below in the detailed notes, but 
the following evaluative comments concern the Department of State series as a 
whole. 

Bromwell, in preface of the History of Immigration, stated that his data were 
derived entirely from the following official documents: 

First, and chiefly, from the Annual Reports on Immigration prepared at the 
Department of State, and by the Secretary communicated to Congress in compliance 
with a requirement of the Passenger Act of March 2, 1819. 


Secondly, from Passenger Abstracts transmitted to the Secretary of State by 
Cellectors of the Customs, and on file in the Department, yet not embraced in the 





% Edward Jarvis, “Immigration,” Atlantic Monthly, Vol. 29 (April, 1872), p. 456. 

™ George Tucker, Progress of the United States in Population and Wealth (1855 ed.; New York: Hunt’s Mer- 
chants’ Magazine), pp. 83-84. 

% DeBow, op. cit., p. 424. 
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Annual Reports on Immigration, because not received until those Reports had been 
completed and laid before Congress. 

Thirdly, from such custom-house records as furnished immigration statistics never 
communicated to the Secretary, or which, if ever communicated, are now missing 
from the files of the Department.” 


These sources, Bromwell asserted, constituted “all the available official infor- 
mation of importance in possession of the country relative to immigration.” As 
his statement indicates, his series is a correction of the reports to Congress; 
and it now provides the official immigration statistics for the fiscal years 1820 
to 1855 inclusive. Concerning the reports, Bromwell wrote that “many of them 
are without method, have no recapitulations appended to them, and, as pub- 
lished, contain numerous typographical as well as clerical errors.”*’ In agree- 
ment is the opinion expressed later in the Census of 1880 where it was said that: 
From and after 1820 we have official statistics of immigration. It is believed that 
during the earlier decades of this period, and indeed until within the last fifteen years, 
the returns on which these statistics are based were in general made somewhat care- 
lessly by the collectors of the several ports of arrival, and were subjected to little or 

no intelligent and systematic scrutiny... . ** 


Bromwell’s account of his correction of the series indicates that reports from 
some collectors of customs and for some quarters were not received or were 
omitted from the published statistics; and it is not known to what extent he was 
able to correct for such gaps in the data. His own tabulations by port of arrival 
show considerable variation in the number of ports included. For the first fiscal 
year, 34 ports or customs districts are shown, from Belfast and Waldoboro in 
Maine to New Orleans; and a few arrivals at the lake port of Sandusky in Ohio 
are included. In later years there was a shrinking of the number of ports, to only 
18 or 20 in some years; and for the quarter ended December 31, 1832, returns 
were received only from Boston and Charlestown in Massachusetts and from 
New York City. In the early days of sailing vessels and shallower draft more 
ports received transatlantic shipping and passenger arrivals were more widely 
dispersed. As Rubin has pointed out, the change to steamships and the con- 
centration of transatlantic trade in the hands of a few companies and at a few 
ports facilitated the collection of immigration statistics.*® 

Tucker, who wrote before Bromwell’s corrected series was available, was 
equally critical of the original Department of State data. To quote his com- 
ments at some length: 

From 1820 to 1830, when the collectors of the customs were required to report to 
the State department the number of foreigners who had arrived in their respective 
ports by sea, we might have expected entire accuracy; but these reports are so much 
at variance with other documents, entitled to respect, and are confessedly so defective, 
that they cannot be relied on. Thus, to give an example, the number of emigrants 
who left the United Kingdom in 1829 for the United States, was, according to the 
British official returns, 15,678; yet the whole number of foreign emigrants from all 


parts of the world, reported to the State department in the same year, was but 
15,285, there being, besides less important omissions, that of New York for the third 





% Bromwell, op. cit., Preface. 

37 Ibid. 

38 Census of 1880, Vol. I, pp. 458-459, 
%® Rubin, op. cit., p. 2. 
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quarter. Again, the number of foreign emigrants returned to the State department 
for 1830, is but 9,466, though 30,224 landed in New York alone in that year, for the 
whole of which the proper officers had failed to make any return. In consequence of 
these, and like instances of failure of duty, the number of foreign emigrants returned 
to the State department for the six years from 1825 to 1830, both inclusive, was 
only 87,140; whilst the number who emigrated from the United Kingdom to the 
United States for the same six years, according to the official accounts in that 
country, was 80,522, which allows but 6,618 for the number of emigrants to the 
United States from all the other parts of the world, though it is known that these 
(including the emigrants from the rest of the British dominions) are nearly equal to 
the number from the United Kingdom.“ 


Immigration for the decade 1820 to 1830 he estimated at about 200,000, con- 
siderably in excess of the official figures. 

Concerning the decade 1831 to 1840 inclusive, Tucker believed that the re- 
turns for 1831 and 1832 were too low, for according to his information arrivals 
at the port of New York exceeded those officially reported for the entire country. 
Omissions for the subsequent years of the period, however, he considered to be 
comparatively small.“ 

Davis, a more recent critic of the immigration statistics, noted discrepancies 
between the Congressional documents containing statistics of passenger ar- 
rivals and the Department of State reports, and commented that the latter 
were “full of minor omissions and discrepancies.” A tabulation of the dis- 
crepancies is contained in Davis’ critique of the data.“ The discrepancies may 
be explained at least in part by Bromwell’s remarks as quoted above and by his 
corrections of the Department of State data. 

In addition to those mentioned above there are other probable sources of 
deficiency in the immigration statistics derived from the passenger lists; but 
there is little or no basis for estimating the extent of the deficiency. The Civil 
War during the later years of the Department of State series may have reduced 
the efficiency of the statistical work of the Federal government; and the pre- 
sumably small number of arrivals at southern ports during the war years were 
not ineluded in the national total.“ Rubin has called attention to the smuggling 
in of slaves after 1808 as a source of deficiency in the immigration record, and 
to the lack of information about the number brought in before that time. He 
cites an estimate that at least 270,000 were brought in between 1808 and 1860.“ 

Even less is known statistically of the smuggling in or surreptitious entry of 
aliens, which may have approached or even exceeded the volume of the illicit 
slave trade. From early in the nineteenth century until finally declared un- 
constitutional in 1876, a number of states had laws levying head taxes on im- 
migrants or requiring that shipowners be bonded to indemnify local authorities 
charged with the relief of recently arrived aliens. States with such laws at one 
time or another included Maine, New Hampshire, Massachusetts, New York, 
Pennsylvania, Maryland, Alabama, Louisiana, and Texas. Contemporary re- 





Tucker, op. cit., pp. 82-83. 

“ Tbid., pp. 84-85. 

« International Migrations, Vol. II, pp. 647-48. 

* During the war years reports continued to be received from Key West in Florida, and other southern ports 
reappeared as they returned to Federal control. 

“ Ernest Rubin, The Demography of Slavery in the United States, 1790-1860, National Bureau of Economic 
Research, Conference on Income and Wealth, September 4-5, 1957. 
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ports indicate that considerable numbers of immigrants were landed at out-of- 
the-way places along the coast in order to evade these requirements; and such 
arrivals were certainly not included in the immigration statistics.“ 


C. The Bureau of Statistics Series, 1868-1891 


The Bureau of Statistics, Treasury Department, series constitutes the of- 
ficial immigration series from 1868 to 1891 inclusive. All the reports of this 
series were for a fiscal year ended June 30.“ The Bureau’s publication of im- 
migration statistics began with a report of passenger arrivals for the quarter 
ended September 30, 1867.7 Of the 95,281 passengers arrived during that 
quarter, 13,948 were classified as non-immigrants of whor 12,623 were re- 
corded as “belonging to the United States” and an additional 1,325 were aliens 
who did not intend to remain in the United States. The report further classified 
the passengers into six nationalities, United States included. Later quarterly 
statements in the Monthly Reports gave additional information on the port of 
arrival, sex, age group, and occupation of the passengers. 

Publication of the Bureau of Statistics quarterly statements on immigration 
continued in the Monthly Reports of the Director of the Bureau of Statistics 
through the first quarter of 1869. Thereafter, the series was continued, as de- 
scribed in Part I above, in the Monthly Reports on the Commerce and Navigation 
of the United States and the succeeding publications of the Bureau. A classifica- 
tion of passengers into cabin and non-cabin class began with the statement for 
the quarter ended September 30, 1869, and the number of deaths on voyage was 
also reported. 

1. Comparison with Department of State series—The Bureau of Statistics 
obtained its immigration data from the reports of the collectors of customs, 
the same source as the preceding series. The basis of reporting appears to have 
remained the same as in the Department of State series except for one im- 
portant change toward a more refined definition of immigrant. This change was 
the exclusion of the temporary visitors, the “persons not intending to remain in 
the United States,” who had been distinguished in the statistical reports since 
1856 but who remained in the official totals from 1856 to 1867 inclusive. Com- 
parison of the two series in the years 1867 to 1870 during which they were both 
issued shows that the Bureau of Statistics totals were about 14 per cent lower. 
This is presumably due to the exclusion of the transients, for it is approximately 
their proportion of alien passenger arrivals during the preceding decade. 

2. Basis of reporting immigration.—The totals given in the Bureau of Statis- 
tics series were stated to be passenger arrivals, and with the removal of citizens 





See for example Marcus Hansen, op. cit., p. 257. Edith Abbott’s Historical Aspects of Immigration Problems, 
p. 769, contains a speech (1849) against immigration in which the following statement about the immigration sta- 
tistics is made: “No returns are made by any ships under twenty tons, and from some cause, none from the ports of 
New Jersey. It is well known that a great many vessels neglect to make any returns, and also that most of the 
immigrants who land at Halifax and Quebec come through the British provinces into the United States. The great 
number of timber ships which sail annually from England to Quebec afford large and cheap facilities to the immi- 
grants by that voyage, and great numbers annually come by it. It is estimated that one-fourth of the immigrants 
to the United States are either through the British provinces or through our ports and not amereed,” The speaker, 
however, gave no sources for his information. 

“In adjustment to the new fiscal period, the immigration statistics for 1868 are for the six months ended 
June 30, 1868. 

4? Monthly Report of the Director of the Bureau of Siatistics, No. 11 (November 20, 1867), p. 11. 
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and transients the indicated definition of an immigrant for statistical purposes 
was an arriving alien who declared the intention of remaining in the United 
States. 

It is assumed that only actual arrivals were counted, and that deaths on 
voyage were deducted from the passenger lists; but the only evidence for this 
assumption is the designation of the data as passenger arrivals. An explicit 
statement concerning the treatment of deaths in the immigration statistics has 
not been found, but the number of deaths on voyage was recorded along with 
passenger arrivals in the Bureau’s reports, beginning with the third quarter 
of 1869. 

A classification of passengers according to cabin or non-cabin class in an 
early year of the Bureau of Statistics series indicates that cabin class alien 
passengers were included as immigrants unless they were transient visitors. 
Unfortunately, this classification was not continued in the quarterly statements 
or annual summaries, so that the later treatment of cabin passengers in the im- 
migration statistics cannot be followed precisely.** 

The period of reporting throughout the Bureau of Statistics series was a 
fiscal year ending June 30. In adjustment to this period of reporting from the 
previous calendar year basis, the immigration total reported for 1868 covered 
only the six-month period from January 1 to June 30. 

8. Country of origin.—The wording of the manifest or passenger list require- 
ment remained unchanged during this period, including the imprecise phrase 
calling for information on the country to which passengers belonged. The first 
quarterly statements from the Bureau of Statistics classified arriving passen- 
gers according to “nationality.” Ten years later the Bureau still presented a 
table on nationality in its annual statement on immigration, but had added the 
further explanatory phrase, “country of last permanent residence or citizen- 
ship.” This was perhaps in acceptance of the situation that collectors of cus- 
toms and ships officers filling out the manifests set down country of nativity, 
country of citizenship, or country of last permanent residence at their own 
discretion. 

4. Land border arrivals.—Arrivals at inland places and lake ports continued 
to be included in the quarterly statements on immigration, although with some 
irregularity in the number of places reporting and with no reports for some 
quarters. Included among the arrivals via Canada were considerable numbers 
of Europeans. The collection of immigration statistics at the land borders or 
lake ports was finally suspended on June 30, 1885.‘* According to the explana- 
tion given later by the Bureau of Statistics and already quoted in part: 

... as there was no law providing for the collection of these statistics, their ac- 
curacy depended entirely upon the vigilance of collectors of customs on the frontier. 
During the recent years, the means of communication between this and adjacent 


countries by railroads increased to such an extent that it was found impracticable to 
collect accurate data by mere inspection of the passengers on railway trains. . . . ®° 





48 See detailed notas for the last years of the Bureau of Statistics series, in Appendix A. The data show indirectly 
that some cabin class passengers must have been classified as immigrants. 

4«* Nevertheless, a few arrivals at inland places are shown for the fiscal year ended June 30, 1886. Quarterly 
Report of the Chief of the Bureau of Statistics, 1892-93, No. 2, p. 401. 

6° Bureau of Statistics, Arrivals of Alien Passengers, 1893, pp. 12-13. 
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The counting of arrivals via the land borders was not resumed until after the 
termination of the Bureau of Statistics series. 

With the end of the attempt to count immigrants arriving at the land borders, 
the counting of Canadian and Mexican immigrants coming by sea was also 
stopped, for it was realized that the seaport data gave a very incomplete record 
of the immigration from those countries. The immigration statistics for 1886 
and the following years, in other words, were not only limited to immigrant 
arrivals by vessel but also omitted migrants from Canada and Mexico. Legal 
authority and administrative regulations did not provide for the counting of 
migrants from these countries until the fiscal year 1908." 

The number of Canadian and Mexican immigrants, consisting of those ar- 
riving by sea plus those recorded at the land borders for the years 1879 to 1885 
inclusive, were as follows ** 











British North : 
Year American Possessions Mexico 
1879 31,268 556 
1880 99 , 706 492 
1881 125,391 325 
1882 98 ,295 366 
1883 70,241 469 
1884 60 , 584 430 
1885 38,291 : 323 











During these years the incompletely recorded immigration from Canada and 
Mexico averaged more than one-seventh (nearly 14.6 per cent) of all recorded 
immigration. (For the proportion of migrants from Canada and Mexico when 
they were again included, and for estimate of the number of European im- 
migrants who came by way of Canada between 1885 and 189i, see Part II D, 
Section 3 below.) 

5. Reentry of aliens.—The immigration statistics for this period, and indeed 
to some extent up to the present day, are inflated by the reentry of aliens of the 
immigrant class who have been admitted previously and then departed. The 
volume of reentries is incompletely known, and has generally been assumed to 
be small or has been neglected. That it may have been greater than generally 
assumed is indicated by Berthoff, for one, in his British Immigrants in Industrial 
America, where he notes that transatlantic fares were sufficiently low and the 
wage difference between the New World and the Old high enough so that “some 
British artisans were able to undertake an annual round trip for a summer’s 
work in the United States,”’ and mentions the coming of seasonal laborers for 
employment as quarrymen, stonecutters, workers in the building trades, etc.* 

The one group of immigrants whose reentry is found to have been recorded 





51 Whether because of misunderstanding of instructions or for other reasons not now known, some Canadians 
and Mexicans were in fact included in the immigration statistics of most years from 1886 to 1907 inclusive. For the 
numbers so included, see Part II D, Section 3 below. 

8 Quarterly Report of the Chief of the Bureau of Statistics, 1888/89, No. 1, for the quarter ended September 30, 
1888, pp. 208-211. The data are further classified by sex, and by province of Canada. 

8 Rowland T. Berthoff, British Immigrants in Industrial America, (Cambridge: Harvard University Press, 
1953), pp. 17, 80, 82. 
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is the Chinese who were readmitted if provided with legal certificates of previous 
residence in accordance with the Act of May 6, 1882. The number of such re- 
admissions by fiscal year was: 


The readmission of Chinese by certificate of residence was discontinued October 
1, 1888." 


D. The Bureau of Immigration or INS Series, 1892 to date 


The present series began with the establishment of a separate Bureau of 
Immigration in 1892, and has been continued by the various successors of this 
Federal agency, now the Immigration and Naturalization Service (INS). 
Although a continuity of organization has been maintained since 1892, the 
immigration statistics are not an entirely uniform series during these years, 
for a number of changes have been made in the composition of the data since 
1892 in order to reach a more explicit and more precise definition of who is an 
immigrant for statistical purposes. As was done for the preceding series, the 
principal known changes are described below and additional information is 
given in a later section of detailed notes for separate years. 

1. Comparison with Bureau of Statistics series—The Bureau of Immigration 
provided the official immigration statistics beginning with the fiscal year ended 
June 30, 1892." As stated in the first annual report of the Bureau, an immigrant 
was defined for statistical purposes as an alien coming to the United States for 
permanent residence. The number of immigrants reported for 1892 was for 
continental United States only, with Alaska excluded, included only arrivals 
at United States seaports, and did not include cabin class but only steerage 
class passengers. It was also stated that admissions over the land borders and 
coastwise travel from Canada or Mexico were not included. 

The Bureau of Statistics continued to issue its own immigration statistics 
through the fiscal year 1895, and the two series, therefore, overlapped for the 
four years 1892 to 1895 inclusive. Comparison of the number of immigrants 
reported in the two series is given below. 











Fiscal Year Bureau of Statistics Bureau of Immigration 





1892 623 ,084 579 ,663 
1893 502,917 439 ,730 
1894 314,467 285 ,631 
1895 279 ,948 258 , 536 














% From Quarterly Report of the Chief of the Bureaw of Statistics, 1890-91, No. 8, p. 608. Somewhat different 
totals are given in the original reports for these years. 

% Concerning the later treatment of reentries, see below under the Bureau of Immigration series. 

% The June 30 fiscal year has been maintained to the present time. 
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For the four years as a whole the Bureau of Immigration series was somewhat 
more than 9 per cent below the older series. 

This large discrepancy has not been fully explained, but it is presumably 
due wholly or in large part to two changes in the compilation of the immigra- 
tion statistics. In the first place, the reporting of immigration was changed from 
the arrival basis used by the Bureau of Statistics to an admissions basis. That 
is to say, aliens debarred or refused admission to the United States were not in- 
cluded in the immigration statistics in the first years of the Bureau of Im- 
migration series. (For later treatment, see Section 4 below, Arrivals or admis- 
sions.) The number debarred at seaports in the fiscal years 1892 to 1894, and 
therefore not included in the immigration totals, was as follows: 


The number debarred is not large relative to total immigration, and was even 
smaller in the preceding decade. 

The second and more important change from the Bureau of Statistics prac- 
tice was to define an immigrant for statistical purposes as an alien steerage 
passenger. Cabin class passengers were not included as immigrants for a number 
of years after the beginning to the Bureau of Immigration series, even if com- 
ing for permanent residence; and there is at least indirect evidence that alien 
steerage passengers were counted as immigrants regardless of their intended 
duration of stay in the United States. (See further under Section 2 below, 
Cabin and steerage passengers.) Davis considers the difference between the two 
series to be due largely to the omission of cabin passengers from the new series, 
and writes in explanation that the Bureau of Statistics: 

. .. based its count on returns from customs officers who, unlike immigration offi- 
cers, are more concerned with cabin than with steerage passengers. It seems likely 
that some of the ‘temporary’ arrivals included by the Bureau of Statistics but ex- 


cluded by the Immigration Bureau were cabin aliens intending to reside in the United 
States and therefore properly considered immigrants.*” 


It is assumed, therefore, that the lower count of immigration given by the 
Bureau of Immigration in its early years represents an understatement of the 
number of immigrants; and Davis suggests an upward revision by 8 per cent as 
a reasonable correction for the years before cabin passengers were included.** 
(See further below.) 

2. Cabin and steerage passengers.—The term cabin passengers was used to re- 
fer to the several classes of passengers who were not in the steerage class. The 
publications of the Bureau of Statistics and the Bureau of Immigration do not 
explain fully how passengers of this group were treated statistically,®* and this 
remains one of the more obscure points in the interpretation of the immigration 
statistics. As mentioned above in the notes on the Bureau of Statistics series, 





587 International Migrations, Vol. II, p. 651. 

58 Ibid. 

** Information on the number of cabin class passengers was presumably collected from 1855 onward, for the 
Act of that yeat called for report of “the part of the vessel occupied” by each passenger, and the Act of August 2, 
1882 required a separate listing of cabin passengers. See Appendix C. 
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cabin passengers were included as immigrants in at least some of the early 
years of that series, but the statistical treatment of these passengers cannot be 
traced through the series. The Bureau of Immigration, as described above, 
counted only steerage passengers as immigrants in its first year of operation, 
and it did not expand the operating definition of immigrant to include cabin 
passengers until some years later. Davis, who gave particular attention to this 
matter, stated her information as follows: 


No point in the interpretation of the statistics is more baffling than the question 
whether these ‘cabin aliens’ were included in the totals. New York counted only 
steerage aliens as immigrants in 1895, and at other ports presumably the practice 
was the same. The total for all ports in 1897, 1898 and 1899 is definitely said to include 
only steerage passengers. The report for 1899 estimates that there were in that year 
25,000 aliens who arrived first or second class, ‘who intended to remain here and who 
would have been classified as immigrants had they traveled in the steerage.’ We have 
good evidence that, until after 1903, cabin aliens were not included as a rule among 
the immigrants, although as early as 1900 protests were mentioned in the reports 
against the class discrimination that led immigration officers to consider all steerage 
aliens immigrants, but exempted cabin aliens from inspection. A passage in the 1903 
Report suggests that the total number of cabin aliens entering that year (64,269) 
were immigrants, though they were not so counted. Probably cabin aliens were in- 
cluded beginning with 1904, but in 1904 and 1905 there seems to have been some 
confusion between cabin aliens and aliens in transit (who supposedly were not classed 
as immigrants). As late as 1912 the inspection and consequently to some extent the 
count of immigrant aliens traveling in the first or second class were said to be in- 
adequate. 


The number of alien cabin passengers, none of whom were counted as im- 
migrants, can be computed from or was given in the Annual Reports for 1899 to 
1903. The totals were as follows: 








Year Steerage Immigrants Alien Cabin Passengers 





1899 311,715 49 ,721* 
1900 448 ,572 65 ,635 

1901 487 ,918 74,950> 
1902 648 ,743 82 ,055> 
1903 857 ,046 64 , 269> 











* This figure obtained by subtraction of steerage immigrants from the alien passenger total. 
> Reported as “other alien passengers.” 


At this time all aliens arriving by steerage were inspected and were counted as 
immigrants, whether or not they were coming for permanent residence.“ There 
is little basis for estimating how many of the aliens in cabin class were in fact 
immigrants; but the immigration officials at the time were convinced that con- 
siderable numbers of aliens came by cabin class to avoid inspection and possible 
exclusion. The Annual Report for 1899 gave the estimate that 25,000 or one-half 
of the cabin class aliens of that year were immigrants (i.e., were coming for 
permanent residence).® 





© Tbid., pp. 650-651. 

® In some Annual Reports, including thet for 1902, the terms “immigrant,” “steerage alien,” and “alien steerage 
passenger” were used interchangeably. 

® Annual Report, 1899, p. 3. 
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The Annual Report for 1902 noted with satisfaction that the manifesting 
requirement was now extended to cabin class passengers, at least at the port of 
of New York, in spite of protest from the steamship companies. To quote: 

As stated in former reports of the Bureau, the ‘other alien passengers’ are those, so 
fer as such figures refer to the port of New York, who came by cabin, the officers in 
charge at said port avoiding the alleged embarrassments arising from an examination 
of that class of travel by holding that the laws referred to ‘immigrants’ only, and that 
‘immigrants’ do not travel in the cabins, but avail themselves of the cheaper rates at 
which steerage passage is offered. It seems superfluous to comment upon a construc- 
tien of the law which in effect holds that a diseased alien can secure exemption from 
the excluding provisions of the law by simply paying extra the difference between the 
cabin and the steerage rates of transportation. It is therefore with much satisfaction 
that the Bureau finds itself in a position to report that under the present efficient 
commissioner of the New York station the absurd distinction referred to has been 
abolished, and the steamship companies are required to furnish complete manifests 
as prescribed by law, of all alien passengers coming, or ‘immigrating’ to the United 
States on their vessels.” 


No change in the statistical treatment of cabin passengers was reported in 
1903. Thereafter cabin passengers were presumably included as immigrants 
if they were aliens who declared the intention of taking permanent residence in 
the United States; but as Davis notes, there is not full assurance that there was 
uniform practice in the counting of such aliens during the next few years. It is 
worth noting also that inclusion of aliens of cabin class in the immigration 
statistics and their inspection by the immigration authorities depended on 
their declaration of intention to take permanent residence in the United States. 

Willcox estimates that the inclusion of cabin class aliens increased the num- 
ber of immigrants by nearly 12 per cent.“ Unfortunately, the reports for 1904 
and after did not give the numbers of passengers and immigrants that came by 
cabin class. 

8. Land border arrivals—The Bureau of Immigration in its first year of 
operation continued the established practice of counting only arrivals at sea- 
ports and of omitting arrivals from Canada and Mexico whether via the land 
borders or by coastwise travel. In its first Annual Report, however, the Bureau 
expressed its concern at the uninspected and uncounted immigration from 
Canada, saying that “immigration from Canada to the United States has be- 
come so considerable that it is worthy the attention of Congress,” and added 
the estimate that 60,000 Canadians crossed the border into New England every 
spring.® It was believed at the time that aliens who might have been excluded 
on arrival at seaports of the United States were coming by way of Canada to 
avoid inspection. 

In the report for the following year, 1893, mention was made of the large 
number of immigrants from Canada who arrived at Boston and were not 

® Annual Report, 1902, p. 16. 

International Migrations, Vol. II, p. 92. 

® The Superintendent of Immigration in his report to the Secretary of the Treasury for the fiscal year 1892 
added that the provision for inspection at the land borders contained in the recent Act of March 3, 1891 was so 
restricted as to be “practically inoperative.” (1892 Annual Report, p. 11.) 

The wording of the section of the Act dealing with land border inspection was as follows: “sec. 8... the 
Secretary of the Treasury may prescribe rules for inspection along the borders of Canada, British Columbia, and 


Mexico so as not to obstruct or unnecessarily delay, impede, or annoy passengers in ordinary travel between said 
countries ....” 
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counted in the immigration total. In the latter part of 1893 an agreement was 
concluded with the Canadian government and the steamship lines using Cana- 
dian ports, and provision was made for inspection by representatives of the 
Bureau of Immigration stationed at designated Canadian ports of all arriving 
aliens destined for the United States. This inspection began in October, 1893, 
and the immigration total reported for the fiscal year ended June 30, 1894 in- 
cluded 7,771 immigrants who came by way of Canada. This number is much 
below the report by the Canadian authorities of an annual average of almost 
80,000 immigrant arrivals at Canadian ports en route to the United States dur- 
ing the years 1885 to 1891 inclusive, and the estimate of 40,000 to 50,000 during 
the 12 months preceding October, 1893. 

The information from Canadian sources was cited by the Bureau of Immi- 
gration as follows: 

The minister of agriculture of the Dominion of Canada, in his official report for the 
calendar year 1891, reports the following immigrant passengers as arrived at Canadian 
ports from European countries en route for the United States during the following 
calendar years: 


Statistics of immigrants entering the United States from Canada have not been 
kept prior to October, 1893, but from Canadian sources it is ascertained that from 
40,000 to 50,000 Europeans entered the United States who landed at Quebec and 
Halifax during the 12 months preceding that date.” 


The much lower number reported for the fiscal year 1894 by the Bureau of 
Immigration representatives in Canada may have been for the reason that they 
were authorized to inspect and report only those arriving alien passengers who 
who declared their intention of proceeding to the United States. Seven years 
later the Bureau explained that: 

No statistics of arrivals from foreign contiguous territory are compiled except of 
such aliens from transoceanic countries as are manifested on the ships’ lists as destined 
to the country, and of those from the same sources who enter the United States 
within thirty days after arrival at the ports of such foreign contiguous country.™ 


The reported numbers of arrivals via Canadian seaports for the fiscal year 
1894 and the immediately following years were as follows: 


13,853 





% Annual Report for 1892, p. 30. 
*? Annual Report for 1894, p. 19. 
% Annual Report for 1901, p. 4. Italics added. 
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During the same four years immigrant arrivals at United States seaports 
numbered 1,084,960. The reported arrivals via Canada thus added an indicated 
3 per cent or a little more to the volume of immigration; but to apply this 
percentage as an upward adjustment of the pre-1894 statistics in allowance for 
immigration via Canada would be a minimum correction in view of the much 

rger figures from Canadian sources and the reasons for suspecting incomplete- 
ness of the Bureau of Immigration data on arrivals in Canada. 

The practice of stationing immigration inspectors at Canadian ports 
achieved the inspection and counting of at least a portion of the immigrar ts 
coming by way of Canada, but it failed to include migrants who originally in- 
tended to reside in Canada but later entered the United States and those 
migrants who did not declare their intention of proceeding to the United States. 
Neither could it include Canadians who migrated to the United States. At this 
time an increasing effort was being made to exclude certain types of aliens con- 
sidered to be undesirable; and the Annual Reports of the Bureau of Immigration 
during the years immediately preceding and following 1900 frequently pointed 
out that the lack of effective control of immigration from the Wdjacent coun- 
tries, Canada and Mexico, permitted easy evasion of the immigration laws.** 

The first year in which land border arrivals were included in the immigra- 
tion totals, except for the sporadic reporting that continued from 1855 to 1886, 
was 1904. In the Annual Report for that year an unspecified number of arrivals 
at Canadian border stations was combined with the arrivals at Canadian sea- 
ports, and arrivals of immigrants at two Mexican border stations in Texas were 
also included. (See detailed notes.) The number of land border stations was 
gradualiy expanded, and is considered to have been completed by 1908. As 
Davis rightly points out, however, a complete counting of immigration via the 
land borders has probably never been achieved, and illegal entry by that and 
other routes has continued to be a source of error in the immigration statistics. 

For the four years 1904 to 1907 inclusive, total immigration and that via 
Canadian seaports and the land borders was as follows.: 








Via Canada and the land 


Fiscal year Total immigration borders 





1904 812,870 31,195 
1905 1,026 ,499 46 ,478 
1906 1,100,735 46 ,769 
1907 1,285 ,349 54,181 











During these years the number of migrants coming by way of Canadian sea- 
ports and the land borders was about 4.4 per cent of that entering through the 
seaports of the United States. 

The recording of immigration at Canadian seaports and via the land borders 
described above primarily gave information about arrivals of Europeans by 
way of the adjoining countries, and it did not include natives and residents of 





See for example the Annual Report for 1898, pp. 37-38, and the Annual Report for 1899, pp. 31-32. 
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Canada and Mexico. As previously described, the counting of immigrants from 
Canada and Mexico was suspended in i885, and they were not routinely in- 
cluded in the immigrant total thereafter until the fiscal year 1908, following the 
Act of February 20, 1907, (34 Stat. 898) and the regulations of July 1, 1907. 
Reported on the basis of country of last permanent residence, however, some 
Canadians were included in the immigration statistics from 1886 onward, and 
Mexicans from 1894 onward. The numbers of these, from 1886 to 1907 inclu- 
sive, are as follows:7° 








Canada and 


Year Newfound- Mexico Canada and 


land Newfoundland 





1886 17 291 
1887 9 352 
1888 15 ~ 1,322 
1889 28 . 396 
1890 183 . 540 
1891 234 “ 636 
1892 (*) . 1,058 
1893 (*) 2 ,837 
1894 194 2,168 
1895 244 5,063 
1896 278 19,918 




















* Not separately reported. 


The basis for the inclusion of these immigrants from Canada and Mexico is not 
stated. Aithough regulations requiring the reporting of this immigration from 
the adjacent countries did not appear until 1907, it is evident that a change to- 
ward the counting of these immigrants began several years before. 

In the absence of more direct information, a rough estimate of the deficit in 
the immigration statistics before 1908 due to the omission of Canadians and 
Mexicans can be made by observing the proportion of such immigrants during 
the following years. For the four-year period 1908 to 1911 inclusive, the num- 
bers were as follows: 








Canad d 
Year N i 0 d Mexico All Countries 





38 ,510 6 ,067 782,870 
51,941 16 ,251 751 , 786 
56 ,555 18,691 1,041,570 
56 ,830 19,889 878 , 587 








203 ,836 60 ,898 3,454,813 








During these four years, therefore, immigrants from the two countries added 
about 8.3 per cent to the volume of immigration. 
4. Arrivals or admissions.—With the growth of the number of excludable 


7 The numbers below are those now given by the Immigration and Naturalization Service, and are nof in all 
cases identical with those originally given in the Annual Reports. (See detailed notes, Appendix A.) 
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classes in the immigration legislation of the 1880’s and 1890’s an increasing 
number of arriving aliens were refused admission. For its first three fiscal years, 
1892 to 1894 inclusive, the new Bureau of Immigration reported immigration 
on an admissions basis. During the next three years the Bureau of Immigration 
reports returned to an arrival basis but gave the numbers of would-be immi- 
grants who were debarred. For these three years, 1895 to 1897 inclusive, the 
numbers arrived, debarred, and admitted were as follows: 








1895 1896 1897 





Immigrants arrived 258 , 536 343 , 267 230 , 832 
Immigrants debarred 2,419 2,799 1,880 





Immigrants admitted 256,117 340 , 468 228 , 952 











It is the larger figure of arrivals, not admissions, that appears in the official 
immigration series for these years. Use of the admission instead of the arrival 
basis in the years 1895 to 1897 would have reduced the totals on the average 
by somewhat more than 0.8 per cent. In the following fiscal year, 1898, the 
reporting of immigration is said to have changed permanently to an admissions 
basis, but the Annual Reports do not clearly indicate an admissions basis of 
reporting until 1904 or 1905. 

5. Aliens in transit.—The classification by destination of the 229,299 im- 
migrants admitted in 1898 indicates that 2,422 or slightly more than 1 per cent 
were in transit through the United States.”! According to Willcox: 

Before January 1, 1903, an alien arriving to traverse the United States on his way 
to some other country was deemed an immigrant, but after that date he was classed 


as a non-immigrant alien. This change excluded apparently about three per cent of 
the arriving immigrants.” 


This statement cannot be confirmed by examination of the Annual Reports 
from 1899 to 1902 inclusive. In these years destinations within the United 
States were given for all the steerage class aliens who were considered to be 
immigrants. Cabin class passengers were not classified as immigrants, and no 
information was given on their destinations or intended duration of stay. 

A category of “aliens in transit” was reported for the fiscal years 1903 and 
1904, and was excluded from the immigration total. Evidently the phrase was 
used in a different sense than now. According to the Annual Report for 1905,” 
for example, alien arrivals were as follows: 








1903 1904 





Immigrants 857 ,046 812,870 
Aliens in transit 64,269 27 ,844 


Total alien passengers 921,315 840,714 














™ Annual Report for 1898, p. 11 
™ International Migrations, Vol. II, p. 92. 
™ Table II, p. 5. 
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Used in this sense to designate alien passengers who were not immigrants, the 
“aliens in transit” phrase presumably included temporary visitors as well as 
those who were, strictly speaking, in transit through the United States to some 
other destination; but in fact it probably consisted of cabin class passengers, at 
least in 1903.74 

6. Reentry of aliens —In his Annual Report for 1897, the Commissioner- 
General of Immigration called attention to the fact that the arriving aliens who 
were classified as immigrants were not all permanent additions to the population 
of the United States, but that many of them departed at some later time and 
that there was no official record of the number of departures. He further stated 
that “about one-half only of those who come remain in the country and become 
permanent residents. Some aliens come and go so often that old officials at the 
immigrant stations recognize them.” The immigration statistics published in 
the Annual Report for 1896 included information on the number of the arriv- 
ing immigrants who had been in the United States before. The numbers of these 
reentries as reported annually thereafter until the statistical series was dis- 
continued in 1909 were as follows: 

















‘ , Per Cent of 

Year Immigrants Reentries Recatries 
1896 343 , 267 48 ,804* 14.2 
1897 230 ,832 44,476 19.3 
1898 229 , 299 42,596 18.6 
1899 311,715 47 ,896 15.4 
1900 448 ,572 52,136 11.6 
1901 487 ,918 58,182 11.9 
1902 648 .743 61,595 9.5 
1903 857 ,046 76,702 8.9 
1904 812,870 103 , 750 12.8 
1905 1,026 ,499 175,624 17.1 
1906 1,100,735 133 ,624 12.1 
1907 1,285,349 74,282 5.8 
1908 782 ,870 63,128 8.1 

8,565,715 982,795 11.5 














* Port of New York only. 


The reentries whose numbers are given above were included in the immigration 
totals. It is probably safe to assume that they had also been counted as im- 
migrants on their previous arrivals. 

Not included in the above totals for the years 1906 to 1908 inclusive are the 
returning alien residents of the United States, who were classified as non- 
migrants during and after 1906. Their number in these years were the following: 








™% Compare with Section 2 above. It is perhaps the 1904 data that led Willcox to believe that transit aliens 


added 3 per cent to the immigration totals. 
% Annual Report for 1897, p. 5; also in the Annual Report for 1896, pp. 11-12. 
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Year Nonimmigrant alien residents of U. 8. 
1906 20,776 
1907 88 ,887 
1908 86 ,814 
1909 138,915 








Also classed as nonimmigrants and excluded from the immigration statistics 
in 1906 and after were aliens admitted for a period of less than one year. (See 
further below, Section 8, Redefinition of Immigrants.) 

7. Continental United States and Insular Immigration—The immigration 
statistics give admissions to continental United States up to and including the 
fiscal year 1900. Honolulu appears as a port of entry for the first time in the 
fiscal year 1901, and the immigration total for that year includes 1,774 immi- 
grants admitted to Hawaii. During the immediately preceding years migration 
from Hawaii to continental United States had been included in the immigration 
statistics as follows: 


SN... Sa zee  etyudeean 256, of whom 134 were Hawaiians 
DI sits: 5: :h:e 0a desk tle med tea 67, of whom 67 were Hawaiians 
pap gelbpeption get oe. odie 6, of whom 5 were Hawaiians 


In 1902 San Juan, Puerto Rico, was added as a port of entry, and contributed 
792 immigrants to the total. Alaska, which had appeared from time to time in 
the immigration record since 1871,”* was regularly included as a place of entry 
from 1904 onward; and immigration to the Virgin Islands was first included in 
1942. 

For the decade 1901 to 1910 inclusive, continental, insular, and total im- 
migration was as follows: 

















Year 2 Oats Insular Immigration Total 
1901 486,144 1,774 487 ,918 
1902 638 ,081 10,662 648 , 743 
1903 840 ,376 16,670 857 ,046 
1904 802 ,128 10,742 812,870 
1905 1,012,908 13,591 1,026,499 
1906 1,089 ,830 10,905 1,100,735 
1907 1,259 ,832 25,517 1,285,349 
1908 771,257 11,613 782 ,870 
1909 748 , 752 3,034 751,786 
1910 1,035 ,834 5,736 1,041 ,570 
8,685,142 110,244 8,795,386 








* Includes arrivals via Canadian ports. 


During this period, migration into the insular possessions contributed about 
1.3 per cent of the total volume of immigration. Migration into the Philippine 





% See detailed notes, Appendix A. 


- 











992 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1958 


Islands was recorded from 1910 to 1931, but was not included in the totals for 
the United States. Since May 1, 1934 arrivals from the Philippine Islands have 
been treated the same as arrivals from other countries. 

8. Redefinition of immigrants.—A redefinition of terms was instituted begin- 
ning with the fiscal year 1906 in order to secure more precisely defined categories. 
Aliens admitted to the United States were classified into the two categories of 
immigrant and nonimmigrant. As explained in the Annual Report of the Com- 
missioner-General of Immigration, immigrant aliens were those “who intend 
to settle here,””’ and the nonimmigrants were those “who avowed an intention 
not to settle in the United States, and all returning to resume domiciles for- 
merly acquired in this country.”* It was further explained that, for comparison 
with preceding years, the total number of immigrants plus nonimmigrants in 
1906 corresponded to the number of immigrants plus aliens in transit reported 
previously. The total number of aliens admitted in 1906 was 1,166,353 of 
whom 1,100,735 were classed as immigrants and 65,618 nonimmigrants. Of the 
latter 20,616 gave the United States as both their last permanent residence and 
their destination. The remaining 45,002 gave other countries as their destina- 
tions.”* 

Beginning with the fiscal year 1908, alien departures as well as arrivals were 
recorded by the immigration authorities. Four categories of alien arrivals and 
departures were reported, these being immigrant, nonimmigrant, emigrant, 
and nonemigrant. According to the explanation given in the Annual Report 
of the Commissioner-General of Immigration: 

In making the classification, the following rule is observed: Arriving aliens whose 
permanent domicile has been outside the United States who intend to reside per- 
manently in the United States are classed as immigrant aliens; departing aliens whose 
permanent residence has been in the United States who intend to reside permanently 
aboard are classed as emigrant aliens; all alien residents of the United States making 
a temporary trip abroad and ail aliens residing abroad making a temporary trip to 


the United States are classed as nonimmigrant aliens on the inward journey and non- 
emigrant on the outward.* 


In practice, the phrase “permanent domicile” was not taken in an absolute 
sense but rather to mean a residence of one year or more. Thus, an alien coming 
to reside in the United States for one year or more was classed as an immigrant; 
and an alien resident of the United States who went abroad and remained for 
a year or more was counted again as an immigrant on this return. 

The above definitions of immigrant and nonimmigrant have remained sub- 
stantially unchanged from 1908 to the present time. According to a later and 
somewhat more explicit definition applicable from 1933 onward: 

An immigrant alien is a nonresident alien admitted to the United States for 
permanent residence. ... A nonimmigrant alien is an alien resident of the United 
States returning from a temporary visit abroad, or nonresident alien admitted to the 


United States for a temporary period. Included in this group are visitors, transients, 
treaty merchants, students, foreign government officials, officials to international 





™ Annual Report for 1906, p. 4. 

18 Tbid., p. 45. 

% Ibid., Table XV, p. 48. The number of aliens who reported the United States to be their place of last per- 
manent residence was 20,776. 

8 Annual Report for 1909. p. 9, 
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organizations, and the wives and unmarried children of these groups. Travelers 
between the United States and insular possessions are not included in the count of 
nonimmigrants, nor are commuters and others who frequently cross the inter- 
national land boundaries. In general, aliens admitted to the United States at land 
boundaries for 30 days or more are included in the statistics. Agricultural laborers 
admitted to the United States under the Act of April 29, 1943, us amended, have 
been included in the statistics if they came from the West Indies. However, agricul- 
tural and railway track laborers admitted from Mexico have not been included in 
the statistics as nonimmigrants.* 


9. Illegal entry.—In addition to recorded immigration there are illegal and 
unrecorded entries, whose numbers are unknown but at times have been 
thought to be considerable. In addition to illegal entries across the land borders 
or by sea or air at other than designated places of entry, aliens legally admitted 
for a temporary stay may fail to maintain the status under which they were 
admitted, and thus become illegal and unrecorded immigrants. Deserting alien 
seamen are an additional category of illegal entries. 

Although the immigration authorities have some information on illegal 
entries, especially on deserting alien seamen and failures to maintain the status 
of admission, and report the expulsions of aliens illegally in the United States, 
there is no satisfactory way of estimating the number of illegal entries.* 


E. Summary 


The following summary gives the definition of immigrant that was applied 
for statistical purposes at different periods of the official immigration series, as 
far as it has been possible to discover what practice was followed in the com- 
pilation of the statistics. The changes that occurred from time to time in the 
applied definition are noted, as are known classes of omissions and inclusions in 
the immigration statistics. Estimates of the effect of the changes, omissions, and 
inclusions, where reasonable estimates can be made, are given in parentheses. 
References are to sections of the foregoing text where fuller information is given. 


1820-1867 inclusive 


Definition. —Alien or foreign-born passengers embarked on vessels arriving 
at seaports of the United States. 

Deaths at sea.—Probably not consistently removed from the passenger totals. 
(From one to two deaths per 1,000 passengers, 1856 to 1860.) See Part II B, 
Sections 2 and 3. 

Births at sea.—No information (probably negligible). See Part II B, Section 
2; also detailed notes for 1871 to 1891 inclusive. 

Temporary visitors and aliens in transit—Included in the passenger totals. 
(Estimated at from 1? per cent to 2 per cent of passenger arrivals before 1856; 
reported for years 1856-1867 inclusive when they constituted about 14 per 
cent of alien passenger arrivals.) See Part II B, Section 3. 

Land border arrivals.—Not included, 1820 to 1854 inclusive, except for a 

© U. 8. Bureau of the Census, Historical Statistics of the United States, 1789-1945 (Washington: Government 
Printing Office, 1949), p. 20. 
® For an account of the various categories of legal and illegal entries and departures see Hutchinson and Rubin, 


“Estimating the Resident Alien Population of the United States,” Journal of the American Statistical Association, 
42 (September, 1947), pp. 385-400. 
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few arrivals during fiscal years 1820-1823 inclusive. Partial count of land border 
arrivals began in 1855. (Jarvis estimate of arrivals from Canada, 1815 to 1860, 
Part II B, Section 6.) 

Illegal entry.—Slaves smuggled in and aliens landed in evasion of head tax 
and bonding requirements among the unreported arrivals. (Estimate of mini- 
mum of 270,000 slaves brought in, 1808 to 1860; no estimate of other illegal 
entries. ). 

Fiscal year.—Variable reporting period; see Part II B, Section 4. 


1868-1891, inclusive 


Definition. —Arriving alien or foreign-born passengers intending to remain in 
the United States, i.e., same as in preceding period except on an arrival basis 
and temporary visitors not included. (Immigrants reported on this basis 14 per 
cent fewer than the number of alien passengers reported on the preceding basis, 
1867 to 1870 inclusive.) See Part II C, Sections 1 and 2. 

Deaths on voyage.—Presumably removed; immigration as reported is stated 
to be on the basis of arrivals. 

Land border arrivals —Incomplete counting of land border arrivals up to end 
of fiscal year 1885; thereafter no count of land border arrivals, nor of Canadian 
and Mexican immigrants either at seaports or at land borders. (Canadians and 
Mexicans 14.6 per cent of immigrants before 1886; for Canadian estimates of 
migration to the United States, 1885-1893, see Part II D, Section 3; and see 
same section for the numbers of Canadian and Mexican immigrants when they 
were again recorded.) 

Reentry of aliens —Aliens counted as immigrants on each reentry, unless 
temporary visitors. (No estimate of reentries.) See Part II C, Section 5. 

Debarred aliens.—Included in the immigration total, which was reported on 
an arrival basis. 

Illegal entry.—No information. 


1892-1908, inclusive 

Definition —Aliens coming by steerage class who were admitted to the 
United States, i.e., changed from preceding basis of reporting by limitation to 
steerage class passengers and by omission of debarred aliens from the immigra- 
tion totals except as noted below. (The change to this basis reduced the reported 
number of immigrants by more than 9 per cent, 1892-1895 inclusive.) See Part 
II D, Section 1. 

Debarred aliens.—Included in the immigration totals for the fiscal years 1895 
to 1897 inclusive, definitely excluded after 1904 or 1905. (For numbers debarred 
1895 to i897 see Part II D, Section 4.) 

Land border entries—No land border entries included, and few migrants 
from Canada and Mexico, whether arriving by land or at seaports. However, 
the immigration reported for the fiscal year 1894 and after included immigrants 
arriving at Canadian ports who declared their intention of proceeding to the 
United States. (For immigration via Canada after 1903 and of Canadian and 
Mexican immigrants, 1886 to 1907, see Part II D, Section 3.) __ 

Temporary visitors and aliens in transit.—Probably included as immigrants 
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if coming by steerage class. A category of “aliens in transit” was excluded from 
the 1903 immigration total, but appears in fact to have consisted of cabin class 
alien passengers. See Part II D, Section 5. 

Reentry of aliens.—Aliens counted as immigrants on each entry if steerage 
passengers. (Varied from about 9 per cent to 19 per cent of all immigrants, 
1896-1903.) See Part II D, Section 6. 

Insular immigration.—Included 1901 and after, Hawaii in 1901, Puerto Rico 
in 1902. See Part II D, Section 7. 


1904-1907, inclusive 

Definition —Aliens admitted to the United States who intend to reside there; 
changed from the preceding period by the inclusion of cabin class passengers. 
(Willcox estimate of nearly 12 per cent increase in the number of immigrants.) 
See Part II D, Sections 2 and 8. 

Land border arrivals.—Reporting of land border arrivals began in 1904 and 
the number of land border stations increased up to 1908. But Canadian and 
Mexican immigrants not regularly counted until 1908. (For 1904-1907 added 
4.4 per cent to immigration by way of seaports of the United States.) See Part 
II D, Section 3. 

Insular immigration.—Ketchikan in Alaska added permanently as port of 
entry in 1904. (1.3 per cent of total immigration to insular possessions.) 

Reentry of aliens—From 1906 on, aliens returning to resume a4 previously 
acquired domicile in the United States not counted as immigrants. See Part II 
D, Section 8. 


1908 and after 


Definition —* Arriving aliens whose permanent domicile has been outside the 
United States who intend to reside permanently in the United States,” with 
permanent residence or domicile defined to mean residence of one year or more. 
See Part IT D, Section 8. 

Canadian and Mexican immigrants.—Included 1908 and after. (For the 
years 1908 to 1911 inclusive, added 8.3 per cent to the number of immigrants.) 
See Part II D, Section 3. 

Insular immigration.—Reported total is immigration to continental United 
States and possessions. Virgin Islands added as place of entry in 1942. 

It is evident that the official series of immigration statistics could be ad- 
justed in order to give somewhat greater comparability from one period to 
another. For example, debarred aliens could be removed from the totals for the 
latter part of the 1880’s and for the several years in the next decade when they 
were again included in the reported totals, and temporary visitors could be de- 
ducted for the years when they were distinguished in the Department of State 
material, 1856 to 1867. Examination of the surviving manifests and comparison 
of them with the reported totals perhaps would give more precise information 
of the practices followed in the compilation of the data, such as the statistical 
treatment of native-born and naturalized citizens in the first decades of the 
series, and might indicate the proportion of temporary visitors and returning 
resident aliens at times when they were included in the immigration totals. 
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A number of other adjustments and corrections could be made, but although 
greater year-to-year comparability might be achieved in this way, the result 
would not be an adequate measure of immigration to the United States. To 
apply correction factors, such as for the proportion of temporary visitors or 
immigrants from foreign contiguous territory, would give estimates of immigra- 
tion that would become increasingly tenuous as they were extended back from 
the years on which the corrections were based. This does not in itself argue 
against making adjustments of the reported totals in order to estimate the 
amount of immigration, but more serious is the large amount of unrecorded 
immigration that cannot be es*imated satisfactorily. Included in this unknown 
category are the cabin class passengers during the years immediately preceding 
and following 1900 among whom the temporary visitors and the true immi- 
grants were not separated, the uncounted arrivals over the land borders during 
the years before land border stations were established, and the unknown num- 
bers of illegal entries from 1819 to the present time. 


APPENDIX A: DETAILED NOTES ON THE IMMIGRATION STATISTICS FOR SINGLE YEARS 





Fiscal Reported Notes 
Year Immigration (Source: Bromwell, 1820-1855) 





(ended Sept. 30) 

1820 8,385 10,311 passengers, of whom 1,926 “‘born in the United States.”’ 
Returns for 34 ports or places, including Sandusky, Ohio (14 
arrivals). 

1821 9,127 11,644 passengers, of whom 2,517 born in the United States; and 
283 reported at Oswegatchie, N. Y. 

1822 6,911 8,549 passengers, of whom 1,638 born in the United States; and 87 
reported at Oswegatchie, N. Y. 

1823 6,354 8,265 passengers, of whom 1,911 born in the United States; and 
55 reported at Oswegatchie, N. Y. 

1824 7,912 9,627 passengers, of whom 1,715 born in the United States. 

1825 10,199 12,858 passengers. 

1826 10 ,837 13,908 passengers. 

1827 18 ,875 21,777 passengers. 

1828 27 ,382 30,184 passengers. Returns from only 10 ports of arrival. 

1829 22,520 24,513 passengers. Returns from 20 ports of arrival. 

1830 23 ,322 24,837 passengers. Returns from 15 ports of arrival. 

1831 22 ,633 23,880 psasengers. 

(15 months ended Dec. 31) 

1832 60 , 482 This consists of 53,179 passengers not born in the United States 
who were reported for the fiscal year ended Sept. 30, 1832 (total 
passengers, 54,351), plus 7,303 passengers reported for the quarter 
ended Dec. 31, 1832. Only Boston and Charlestown in Massa- 
chusetts and New York City reported for the additional quarter, 
and did not classify the passengers according to country of birth. 

(ended Dec. 31) 

1833 58 ,640 59,925 passengers. 

1834 65 ,365 67,948 passengers. 

1835 45 ,374 48,716 passengers. 

1836 76,242 80,972 passengers. 

1837 79 ,340 84,959 passengers. 
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Fiscal Reported 
Year Immigration 





1838 38,914 45,159 passengers. 

1839 68 ,069 74,666 passengers. 

1840 84,066 92,207 passengers. 

1841 80 , 289 87,805 passengers. 

1842 104,565 110,980 passengers. 

(9 months ended Sept. 30) 

1843 52,496 56,529 passengers. 

(ended Sept. 30) 

1844 78,615 84,764 passengers. 

1845 114,371 119,896 passengers. 

1846 154,416 158,649 passengers. Galveston added as port of entry, with 354 
passengers reported. 

1847 234 ,968 239,482 passengers. 

1848 226 ,527 229,483 passengers. 

1849 297 ,024 299,683 passengers. 

(15 months ended Dec. 31) 

1850 369 , 980 The reported total consists of 310,004 passengers not born in the 
United States, who were reported for the fiscal year ended Sept. 
30, 1850, plus an additional 59,976 during the quarter ended Dec. 
31, 1850. (315,334 passengers during the fiscal year, plus 65,570 
during the next quarter.) San Francisco ‘added as port of entry, 
reporting 43,615 passengers during the fiscal year ended Sept. 30, 
but making no returns for the following quarter or the next three 
years. 

(ended Dec. 31) 

1851 379 , 466 408,828 passengers. 

1852 371,603 397,343 passengers. 

1853 368 ,645 400,982 passengers. Astoria, Oregon, added as port of entry. 

1854 427 , 833 460,474 passengers. 

1855 200 ,877 230,476 passengers. 


Inland places: Oswego, N. Y.—5,072 arrivals, of which 3,142 
‘Delong in the United States,” and 3,270 intend to reside there. 
LaSalle, Texas—74 arrivals, 4 of which belong in the United 
States, and all intend to reside there (no Mexicans included). 
Source, 1856-1867, inclusive: State Department reports. 
200 , 436 Total passengers 
Born in the U. 8 


“Country where they mean to reside’: 
United States 
Other countries (specified) 
Not stated 


On the assumption that all the native born intended to reside in 
the United States, at least 4,179 temporary visitors are indicated. 


Died on voyage: 400. 

The deaths are not classified by nativity, and are presumably 
included in the reported total of 224,496 passengers. (See detailed 
notes for 1857, and Part II B, Section 3 of text.) 





® For later revised data in conformity with the Bureau of Statistics series, see Part II B, Section 3 of the text. 
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Fiscal Reported 


Year Immigration vanes 





1856 Continued Inland places: Oswego, N. Y.—3,063 passengers, of whom 1,694 
“belong in the United States” and 1,757 intend to reside there. 
1857" 251,306 Total passengers 
Born in the U.S 


(Includes 21,060 passengers of not stated nativity who were 
assumed to be aliens.) 


Country where they mean to reside: 
United States 243 , 562 
Other countries (specified) 3,937 
Not stated 24,483 
A tabulation given for the first time indicates that the reported 
passenger totals were embarkations, not arrivals: 


Arrivals of passengers in the U. 8 271,558 
Died on the voyage 


Total number embarking at foreign ports for the 
United States 271,982 


Inland places: Oswego, N. Y.—832 passengers, of whom 270 

belong in the United States, and 644 intend to reside there. 
1858" 123,126 Total passengers 

Born in the U. 8 


(Includes 462 passengers of not stated nativity.) 


Country where they mean to reside: 
United States 


Arrivals of passengers in the U. 8 
Died on the voyage 


Total number embarking at foreign ports for the 
United States 
Inland places: Oswego—1,092 passengers, of whom 477 belong in 
the U. 8., and 764 intend to reside there. 
Detroit—3,050 passengers, all of whom are of foreign origin 
and intend to reside in the United States. 
1859%*7 121,282 The reporting of passengers changed in 1859 from an embarkation 
to an arrival basis. 
Total passengers arrivals 155 ,302 
Born in the U.S 


121,075 


An additional 207 died on the voyage, out of a total of 155,509 
embarkations and are included in the immigration total. 





™ For later revised data in conformity with the Bureau of Statistics series, see Part II B, Section 3 of the text. 
% Ibid. 

% The number of temporary visitors or nonimmigrant aliens is given elsewhere as 3,371. 

® For later revised data in conformity with the Bureau of Statistics series, see Part II B, Section3 of the text. 
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Fiscal 
Year 


Reported 
Immigration 


Notes 





1859 Continued 


For those embarked, the intended future place of residence was: 


pe Ee) Og 150 ,824 
Other countries (specified)................... 2,523 
pw eee BaP A Oe he 2,162 

155 ,509 


Inland places: Oswego—320 passengers, of whom 70 belong in the 
United States, and 285 intend to reside there. 

Detroit—1,677 passengers, all’of whom are of foreign origin and 
intend to reside in the United States. 








1860%* 153,640 Total PASHORMNT SETIVEN. oe ew ww ee 179,691 
Born in the United States..................... 26 ,051 
De CN oh 9 153 ,640 
An additional 222 died on the voyage. 
For those who arrived, the intended future place of residence was 
as follows: 
RS Gy eR he 173,491 
Other countries (specified)................... 3,182 
dbs ete re yl «arlene 3,018 
179 ,691 
Inland places: Oswego—498 passengers (second quarter only), of 
whom 163 belong in the United States, and 240 intend to reside 
there (an additional 232 of not stated place of residence). 
1861°* 91,918 No summary tables. No returns south of Baltimore on the Atlantic 
Coast; returns for New Orleans and San Francisco. 
Inland places: Oswego—returns only for third quarter when 714 
arrivals, of which 336 belong in the United States and 411 intend 
to reside there. 
1862*° 91,985 Total passengers 114,475. No returns from southern ports except 
Key West in Florida. No returns for inland places except Oswego, 
where 1,523 passenger arrivals, of which 724 belong in the United 
States and 857 intend to reside there. 
1863" 176,282 Total passengers 199,811. No inland arrivals reported. Key 
West the only southern port included. 
1864% 193,418 Total passenger arrivals 221,535, and 3 deaths on the voyage. 
No inland arrivals reported. 
1865" 248,120 According to the report of the Secretary of State: 
po RT ee) COTTE Tee 287 ,397 
Natives of the United States................. 38,794 
MET BET Risa s 0:69:50 0 Ate BU 248 ,603 
Returns included for Beaufort and Charleston, 8. C., Savannah, 
Ga., and New Orleans. 
88 Thid. 
89 Tbid. 
% Ibid. 
% Toid. 


% Ibid. 
% Ibid. 
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Fiscal Reported 


Year Immigration Notes 





1865 Continued Inland places: Cuyahoga district, Ohio 
Detroit (3d and 4th quarters) 
Chicago (3d and 4th quarters) 
Minnesota district (4th quarter) 
1866" 318,568 The report of the Secretary of State covered only the first three 
quarters of the year. 
Total passengers 
Born in the U. 8 


Aliens 


Inland arrivals: 
Cuyahoga district, Ohio 
Detroit 


Minnesota district 
1867"% 315,722 Last year in which the Department of State data are used in the 
present official series. The report of the Secretary of State (40th 
Congress, 2d Session, House Executive Document No. 18) was 
for a fiscal year ended Sept. 30, 1867. The total now used has been 
adjusted to the calendar year basis of the preceding years. 


For the period ended Sept. 30, 372,461 passenger arrivals were 
reported, of whom 332,302 were aliens. Elsewhere in the same 
report it was stated that the number of immigrants, those “who 
arrived with the purpose of remaining,” was 327,183, and that 
3,762 aliens arrived ‘‘on their transit to another country.” This is 
the first report of aliens in transit. 


Inland arrivals were reported from Illinois, Michigan, and the 
Cuyahoga district in Ohio. The southern ports contributing to the 
immigration total were Norfolk, Charleston, and Savannah. 


Bureau of Statistics data for the calendar year 1867, which in- 
cluded alien passengers in the first six months and immigrants 
(alien passengers intending to remain) in the second six months, 
and which had been revised from the figures originally released by 
the Bureau, gave a lower total: 

Quarter ended March 31 

Quarter ended June 30 

Quarter ended Sept. 30 

Quarter ended Dec. 31 


295 ,642 
(from Monthly Summary of Commerce and Finance, series 1902-03, 
No. 12, p. 4362). 
(6 moaths ended June 30) 
1868 138 ,840 Source, 1868-1891, inclusive: Bureau of Statistics reports 


Immigration statistics from the Bureau of Statistics, Treasury 
Department; and the reported total is the number of immigrants, 
defined as the number of arriving alien passengers intending to 
remain in the United States. 
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Fiscal Reported 


Year Immigration Wette 





1868 Continued As originally given in the Monthly Reports, arrivals in the six 
months ended June 30, 1868 were: 


(ended June 30) 
1869 352,768 


352 ,569 


Total passengers 389 ,651 


Of the 389,651 passengers, 53,342 were cabin class, 
387 , 203 Arrivals as originally given is the Monthly Reports: 


Citizens returning to the U. S 
Foreigners not intending to remain 
Immigrants 


Total passengers 


Of the 436,496 passengers, 94,272 were cabin class. 
Citizens returning to the U. 8 

Foreigners not intending to remain 

Immigrants 


Total passengers 


Of the 386,271 passengers, 78,320 were cabin class. 

Included among the 321,350 immigrants were 29 born at sea. 
Passenger arrivals but no immigrants were reported for Alaska. 
Citizens returning to the U.S 

Aliens not intending to remain 

Immigrants 


Total passenger arrivals 


Of the 472,034 passengers, 84,197 were cabin class. 

Included among the 404,806 immigrants were 121 born at sea. 
Passenger arrivals including 20 immigrants reported for Alaska. 
Citizens returning to the U.S 47,744 
Aliens not intending to remain 13 ,338 
Immigrants 459 , 803 


Total passenger arrivals 


Of the 520,885 passengers, 73,802 were cabin class. 

Included among the 459,803 immigrants were 138 born at sea. 
313 ,339 Citizens returning to the U. 8 47 ,730 

Aliens not intending to remain 14,610 

Immigrants 313,339 


Total passenger arrivals 375 ,679 
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Fiscal Reported 


Year Immigration Notes 





1874 Continued Passengers not classified according to class of passage. 
Included among the 313,339 immigrants were 90 born at sea. 
1875 227 , 498 Citizens returning to the U. 8 50,898 
Aliens not intending to remain 17,134 
227 ,498 


Total passenger arrivals 295 ,530 


Included among the 227,498 immigrants were 55 born at sea. 
Citizens returning to the U.S 47 ,986 
20,019 
169 ,986 


Total passenger arrivals 237 ,991 


Included among the 169,986 immigrants were 23 born at sea. 
1,899 passenger arrivals, including 1,213 aliens but no immigrants, 
reported for Alaska. 
Citizens returning to the U.S 
Aliens not intending to remain 

141,857 


Total arrivals 206 ,503 


Included among the 141,857 immigrants were 21 born at sea. 
1,674 passenger arrivals but no immigrants reported for Alaska. 
Citizens returning to the U. 8 41,671 
Aliens not intending to remain 19 ,307 

Net immigration 138 , 469 


Total arrivals 199 , 447 


Included among the 138,469 immigrants were 13 born at sea. 
1,407 passenger arrivals but no immigrants reported for Alaska. 
177 ,826 Citizens returning to the U.S 55,256 
Aliens not intending to remain 20 ,128 
Immigrants 177 ,826 


Total passenger arrivals 253 ,210 


Included among the 177,826 immigrants were 31 born at sea. 
No report for Alaska. 

457 ,257 No summary published in the reports for the fiscal year. Passenger 
arrivals and immigration as reported separately for the four 
quarters gave a somewhat lower total immigration, as follows: 


Citizens returning to the U. S 50,205 
Aliens not intending to remain 26 ,497 
Immigrants 457 , 243 


Total passenger arrivals 533 ,945 


Included among the 457 ,243 immigrants were 60 born at sea. 
1881 669 ,431 Citizens returning to the U. S 
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Fiscal Reported 


Year Immigration Notes 





1881 Continued Included among the 669,431 immigrants were 86 born at sea and 
. 13 “picked up at sea.” 
1882 788 ,992 Citizens returning to the U. S 


Total passenger arrivals 


Included among the 788 ,992 immigrants were 99 born at sea. 


Total passenger arrivals 712,515 


The above totals are those given in the summary for the fiscal 
year. The original quarterly reports gave a total of 603,046 immi- 
grants. 


Included among the immigrants were 74 born at sea. 
18 arrivals, all citizens of the United States, were reported for 
Alaska. 
518,592 Citizens returning to the U. S 
Aliens not intending to remain 
Immigrants 


Total passenger arrivals 649 ,491 


Included among the 518,592 immigrants were 86 born at sea. 
395 ,346 Citizens returning to the U.S 97,251 
Aliens not intending to remain 42,412 
Immigrants 395 ,346 


Total passenger arrivals 


Included among the 395,346 immigrants were 67 born at sea. 
Of the 535,009 passengers, 106,362 were cabin class. 
334,203 Citizens returning to the U. 8 86 ,380 
24,078 
334,203 


Total passenger arrivals 444,661 


Included among the 334,203 immigrants were 55 born at sea. 
Of 448,684 passengers (sum of quarterly reports), 103,400 were 
cabin class. 
The following arrivals at inland places were included: 
Arrivals Immigrants 
1 
2 
147 
40 


Citizens returning to the U. S 92,347 
Aliens not intending to remain 22,929 
Immigrants 490,109 


Total passenger arrivals 
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1887 .Centinued Ineluded among the 490,109 immigrants were 63 born at sea. 
Of the 605,385 passengers, 112,202 were cabin class. 
‘No ecrivals reported for itiland ‘places. 

1888 646,889 ‘Cit’ sns returning to the U.S 
Aliens not intending to remain 


Total passenger arrivals 


Included among the 546,889 immigrants were 57 born at sea. 
Of the 663,039 passengers, 122,760 were cabin class. 
444 ,427 Citizens returning to the U. 8 81,241 
Aliens not intending to remain oe 
444 ,427 


Total passenger arrivals 546,513 


Included among the 444,427 immigrants were 55 born at sea. 
Of the 546,513 passengers, 119,477 were cabin class. 
455 ,302 Citizens returning to the U. 8 88 ,017 
Aliens not intending to remain 21,123 
455 ,302 


Total passenger arrivals 564,442 


Included among the 455,302 immigrants were 50 born at sea. 
Of the 564,442 passengers, 121,650 were cabin class. 

158 arrivals, including 6 immigrants, reported for Alaska. 
Citizens returning to the U. 8 

Aliens not intending to remain 

Immigrants 


Total passenger arrivals 668 , 337 


Included among the 560,319 immigrants were 63 born at sea. 

Of the 668,337 passengers, 132,725 were cabin class. 

This is the last year for which the Bureau of Statistics data are 
used in the present official series. 

Source: 1892 to date, Annual Reports of the Bureau of Immigration 
and its successors, now the Immigration and Naturalization Service. 


The reported total of 579,663 was the number of “immigrants 
inspected and admitted into the United States.’ It did not include 
the 2,164 arriving aliens who were debarred. (See p. 10 of Annual 
Report for 1893.) 


In addition, the following arrivals were reported: 

Citizens returning to the U. 8 

Nonimmigrant aliens 
This, together with admissions and debarrments, gives an indi- 
cated total of 695,403 passenger arrivals, but on p. 33 the number 
of passenger arrivals is given as 736,660. 


Only arrivals at seaports were reported, and no natives of Canada 
or Mexico were included. 

1893 439 ,730 The total of 439,730 was reported as the number of “immigrants 
arrived and inspected” (p. 3), and also as the number of immi- 
grants admitted (pp. 4-5). 
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1893 Continued Elsewhere it was made clear that the reported total was on an 
admissions basis (pp. 9-10): 


EOD CUI oii 5.5:¢ cieco eee hsccccess 440 ,793 
Immigrants debarred.....................4. 1,063 
EE eee Are 439 ,730 


No natives of Canada or Mexico are shown in the classification 
oi immigrants by nativity (pp. 4-5), but may have been included 
in the “all other countries’ category. 


Footnote of Table 1, p. 3: ‘‘In addition to the number here classi- 
fied, 28,108 immigrants arrived (at Boston) from Dominion of 
Canada, of which number 52 were debarred.” 

1894 285 , 631 The total of 285,631 was reported on the same basis as in the 
preceding year. 








| i Sill a Re SPD 288 ,020 
Immigrants debarred....................... 2,389 
NS gh ois ve scnr coccgecccnes'e 285 ,631 
Immigrant arrivals, U. 8. ports................. 277,860 
Immigrant arrivals, Canadian ports............. 7,771 
285 ,631 


pp. 17-18. Text of agreement for inspection at Canadian ports. 
p. 19. “Statistics of immigrants entering the United States from 
Canada have not been kept prior to October, 1893, but from Cana- 
dian sources it is ascertained that from 40,000 to 50,000 Europeans 
entered the United States who landed at Quebec and Halifax 
during twelve months preceding that date.” 

1895 258 , 536 The reported total of 258,536 was the number of aliens “arrived 
and inspected,” and included those refused admission. 








Alien arrivals, U. 8. continental ports........... 252 ,548 
Alien arrivals, Canadian ports.................. 5,988 

6 Re oat ee 258 , 536 
po ES RR re eee 2,419 
Number of immigrants admitted............... 256,117 


Of the 258,536 immigrant arrivals, 116 were natives of Mexico, 
and 239 of British North America (p. 7). 

1896 343 , 267 The reported total of 343,267 was on the same basis as in the 
preceding year, i.e., the number of immigrants arrived and in- 
spected, not the number admitted. 





Alien arrivals, U. 8. continental ports........... 334,346 
Alien arrivals, Canadian ports.................. 8,921 

eR NO i. si cto iccivtinrinionrnres 343 , 267 
Museen Ctheseed. ecsiade iwcdit ds ewer VSS. 2,799 





Number of immigrants admitted. .............. 340 , 468 
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1896 Continued Of the 343,267 immigrant arrivals, 150 were natives of Mexico, 
and 273 of British North America (p. 4). 


The report noted (pp. 11-12) that according to data on steerage 
travel about half of the immigrants departed from the United 
States. 


The following data on alien arrivals at the port of New York 
during the fiscal year were given in the attached report of the U.S. 
Commissioner of Immigration at that port (p. 29): 

Have been in the U. 8. before 

Going to their immediate families 

Actual immigration 


Total immigration 263 , 709 (sic) 


1897 230 , 832 The total of 230,832 was reported as arrivals, not admissions. Of 
the arrivals, according to the Annual Report, ‘228,952 were 
landed and 1,880 were debarred and deported.”™ It was further 
noted that “‘it is not to be assumed that all of the 228,952 immi- 
grants who arrived and were admitted became residents of the 
United States,” and that according to the transatlantic steamship 
companies 129,872 steerage passengers sailed from the port of 
New York during the fiscal year (p. 5). 

The arrivals were distributed as follows: 
Continental United States ports 
Camadia® porteiies is .ic. tdsddess iat tes, ore 


230 ,832 


A new category, “Have been in the United States before, and are 
returning,” contained 44,476 immigrants, and was reported for 
the first time (p. 17). 


By nationality, 91 of the immigrants were Mexican, and 290 
British North American or Canadian (p. 14). 

The total of 229,299 was variously reported as the number of 
immigrants arrived (p. 3) and the number landed (p. 19). The 
number debarred was 3,030. 


Included in the reported immigration total were the following: 


Have been in the U. S. before and are returning. 42,596 
Of Mexican nativity or nationality 107: 
Of British North American nativity or nationality 350, 
Aliens in transit 2,422 
Arrivals via Canadian ports 10,737 
1899 311,715 The total of 311,715 was reported as immigrant arrivals at ports of 
the United States and Canada. The number debarred at ports of 
the United States was 3,798, and an additional 1,099 were refused 
admission at the land borders (pp. 8-9). The total number of alien 
passenger arrivals was 361,436 (United States and Canada, but 
for the latter presumably includes only passengers destined for 
the United States). 





* The total of 1,880 consists of 1,617 refusud admission at seaports, and 263 returned in one year after landing. 
Annual Report for 1902, p. 11. 
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1899 Continued Included in the reported immigration total were the following: 


Have been in the U. 8. before................. 47 ,896 
Of Mexican nationality....................... 161 
Of British North American nationality......... 1,322 
Arrivals via Canadian ports................... 13 ,853 


It was stated in the Annual Report (p. 3) that 25,000 cabin class 
aliens, who intended to remain and who would have been classed 
as immigrants if in steerage class, came during the fiscal year. 

1900 448 ,572 The total of 448,572 was reported as the number of immigrants 
arrived at ports of the United States and Canada. The number de- 
barred at ports of the United States was 4,246, and an additional 
1,616 were refused admission from foreign contiguous territory 
(i.e., at the Canadian and Mexican land borders). 


The total number of alien passenger arrivals, including those at 
Canadian ports destined for the United States, was 514,207; and 
of these the 448,572 steerage passengers were classified as immi- 
grants. The cabin class passenger: numbered 65,635 (p. 14). 


Included in the reported immigration total were the following: 


Have been in the U. 8S. before................. 52,136 
Country “whence they came’’: 
po eee Oa ee 237 
Betteem DOCG AMerics, nacivedd ...c.e cece ces 396 
Arrivals via Canadian ports................... 23 , 200 


1901 487 ,918 The total of 487,918 was reported as the number of immigrants 
arrived at ports of the United States and Canada. The number 
debarred at ports of the United States was 3,516, and an additional 
1,696 were refused admission from foreign contiguous territory 
(p. 8). The total number of alien passenger arrivals, at United 
States and Canadian ports, was 562,868, of whom the 75,950 in 
cabin class were classed as “‘other alien passengers.” 


Included in the reported immigration total were the following: 


Have been in the U.S. before................. 58,182 
Country whence they came: 
pO EEE IE OTL Oe TO OLED 345 
British North America. 1. .in..cccccescccces 540 
Arrivals via Canadian ports............. ..... 25 , 220 
Arrivals at Honolulu (first time reported)....... 1,774 


1902 648 ,743 The total of 648,743 was reported as the number of immigrants 
arrived in the United States. The number debarred at seaports was 
4,974, and an additional 5,437 were refused admission from 
foreign contiguous territory. c 


The number of alien passenger arrivals in the United States was 
730,798, of whom 82,055 ‘‘so far as such figures refer to the port 
of New York” were in cabin class and therefore not classified as 
immigrants (p. 16). 
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1902 Continued Included in the reported immigration total were the following: 


Have been in the U. 8. before 
Country whence they came: 


648 ,743 


1903 857 ,046 The total of 857,046 was reported as the number of immigrants 
arrived in the United States, and was also identified as “steerage 
immigration” (p. 5). The number debarred at seaports was 8,769, 
and an additional 9,922 were refused admission from foreign con- 
tiguous territory. 
In addition to the immigrants were 64,269 “aliens in transit,” 
which together with the immigrants gave 921,315 alien passenger 
arrivals. 


Included in the reported immigration total were the following: 


Have been in the U. 8. before 
Country whence they came: 


35 ,920 


1904 812,870 The total of 812,870 was variously reported as the number of aliens 
arrived in the United States (p. 15), and the number admitted 
(pp. 18-20). The number of alien passenger arrivals was 840,714, 
consisting of 27,844 “aliens in transit” in addition to the immi- 
grants (p. 5). 
The number debarred at seaports of the United States was 7,994, 
and at the Canadian and Mexican borders was 6,856 (pp. 9-10). 
Included in the reported immigration total were the following: 


Have been in the U. 8. before 
Country whence they came: 


British North America 
Tourists (p. 18) 


Alien arrivals, U. 8. continental ports 
Alien arrivals, U. 8S. insular ports: 
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1904 Continued 


1905 1,026,499 


1906 1,100,735 


te 


Ketchikan, Alaska, with 20 alien arrivals, was included among the 
continental ports. 


Land border arrivals were expressly included in the immigration 
total for the first time since 1885. There were 481 at El Paso and 
340 at Eagle Pass in Texas. The arrivals at Canadian border sta- 
tions were not reported separately but were included with those at 
Canadian seaports (p. 4). 

The total of 1,026,499 was reported as the number of aliens, 
exclusive of aliens in transit, admitted into the United States. 
The number in transit was 33,256, and the total number of alien 
passengers who were admitted was 1,059,755. 


The number debarred at the ports of the United States and 
Canada was 11,480, and an additional 5,804 were refused admis- 
sion from foreign contiguous territory. 


Included in the reported immigration total were the following: 
Have been in the U. 8. before............ i 175 ,624 
Country whence they came: 

British North America 
Tourists (pp. 21-23) 


By place of admission: 
U. 8. continental ports 
Honolulu 


1,026 , 499 


Included in the continental total was Ketchikan with 5 aliens ad- 
mitted, and admissions at land border stations. 

Change of terminology and of composition of the immigration 
statistics: the total of 1,100,735 was reported as the number of 
‘<mmigrant aliens admitted,” rather than the former “aliens ad- 
mitted,’’ The Annual Report (p. 4) explained the change as follows: 
“In the year 1905 all aliens arriving at ports of this country, 
with the exception of those merely in transit to other countries, 
were reported as alien arrivals. During the fiscal year 1906 there 
have been segregated from those arriving, not only the transits, 
but all aliens returning from visits abroad to resume previously 
established permanent domiciles in the United States, and all 
coming simply as visitors or tourists with the intention of return- 
ing to homes abroad.” 


Included among the immigrant aliens admitted: 


Have been in the U. 8. before 133 ,624 
(Probably included the 32,897 immigrant aliens admitted who 
gave the United States as their last permanent residence.) 


By country of last permanent residence: 
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1906 Continued 


1907 1,285,349 


782,870 


Not included among the immigrant aliens admitted: 


Nonimmigrant aliens admitted 
(Of these, 20,776 gave the United States as place of last perma- 
nent residence.) 


Aliens debarred at seaports 
Citizens of foreign contiguous territory refused 
admission (13 at seaports) 


By place of arrival, the immigrant aliens admitted included: 
Puerto Rico 


Mexican border stations 

Canadian border stations 

Canadian seaports 
The basis of reporting was immigrant aliens admitted, as in 
preceding year. 


Included among the immigrant aliens admitted: 


Have been in the U. 8. before 
Country of last permanent residence: 
Mexico....... 
British North America 
United States 


Not included among the immigrant aliens admitted: 


Nonimmigrant aliens admitted 153 ,120 
(Of these, 88,887 gave the United States as place of last perma- 
nent residence (p. 48).) 


Aliens debarred at seaports 
Citizens of foreign contiguous territory refused 
admission (101 at seaports) 


By place of arrival, the immigrant aliens included: 
Puerto Rico 
Honolulu 


Mexican border stations 

Canadian border stations 

Canadian seaports 
The reported total of 782,870 was the number of immigrant aliens 
admitted, as in the preceding year. 


Included among the immigrant aliens admitted: 


Have been in the U. S. before 
Country of last permanent residence: 


British North America 
United States 
Not included among the immigrant aliens admitted: 


Nonimigrant aliens admitted 141,825 
(Of these, 86,814 gave the United States as place of last perma- 
nent residence.) 
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1908 Continued Aliens debarred at seaports 
Citizens of foreign contiguous territory refused 
admission (28 at seaports) 


By place of arrival, the immigrant aliens included: 
Puerto Rico 
Honolulu 


Mexican border stations 
Canadian border stations 
Canadian seaports 


Outward alien movement: 
Emigrant aliens 395 ,073 
Nonimmigrant aliens 5 319,755 


Total departed 714,828 
1909 751,786 The total of 751,786 was reported as the number of aliens ad- 
mitted, and was not shown to have included aliens who had been 
in the United States before. 


Included among the immigrant aliens admitted: 


Not included among the immigrant aliens admitted: 


Nonimmigrant aliens admitted 192,449 
(Of these, 159,331 had been in the U. 8. before and 138,915 
gave the United States as place of last permanent residence.) 


Aliens debarred at seaports and land border 
stations 


By place of arrival, the immigrant aliens included: 
Puerto Rico 
1,876 
202 
16,162 
Canadian border stations 53 ,703 
Canadian seaports 12,562 


Outward alien movement: 
Emigrant aliens 
Nonemigrant aliens 174,590 


Total departed 
1910 1,041,570 The terminology and basis of reporting was the same as in the 
preceding year. Information on the number of aliens who had 
been in the United States before was discontinued. 


Included among the immigrant aliens admitted: 


Country of last permanent residence: 
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1910 Continued Not included among immigrant aliens admitted: 
Nonimmigrant aliens admitted 
(Of these, 94,369 gave the United States as place of last perma- 
nent residence.) 


By place of arrival, the immigrant aliens included: 
Puerto Rico 


Mexican border stations 
Canadian border stations 
Canadian seaports 


Outward alien movement: 
Emigrant aliens 
Nonemigrant aliens 177 ,982 


No detailed notes are given for the immigration statistics after 1910, which remained 
quite uniform in composition except as noted in the text section. 





APPENDIX B. LIST OF PUBLICATIONS CONTAINING THE ORIGINAL REPORTS OF 
IMMIGRATION STATISTICS, 1820-1891 


State Department Reports, 1820-1870" 
Year ended Sept. 30: 

1820 16th Congr. 2nd Sess. Senate Document 
1821 17th * Ist House wd 
1822 17th 2nd - ’ 
1823 18th Ist 
1824 18th 2nd 
1825 19th Ist 


“ 
“ 
4 Executive Document 
. 
s 

1826 2nd “* 
i 
& 
a 
“ 


Document 

Executive Document 

“ “ 
1827 Ist 
1828 2nd 
1829 Ist 
1830 2nd 
1831 lst 
1832 2nd 

Calendar Year: 
1833 23rd 
1834 23rd 
1835 24th 
1836 24th 
1837 25th 
1838 25th 
1839 26th 
1840 26th 
1841 27th 
1842 27th 
1843 


“ From International Migrations. Vol. II, p. 658, with revision. 


oa Sa 2 2 te a fs a 


oe 
be 
“ 
be 
“ “ 
“ 
” 


Ist 

2nd 
Ist 

2nd 
2nd 
3rd 
Ist 

2nd 
2nd 
3rd 


House Executive Document 
cy A a 


Senate Document 

House Executive Document 

Senate Document 

House Executive Document 
“ 


“ 


Se@e@rn ea @&ee a eo 


® 8 @ee Ge ese 8 


oe « 
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Year ended Sept. 30: 


1844 28th 2nd “ House Executive Document No. 13 
1845 20th “* ‘ist “* e 7 ws No. 216 
1846 29th “ 2nd “* ° ? . No. 98 
1847 30th “* Ist * * . . No. 47 
1848 30th “ 2nd “* . ° . No. 10 
1849 3lst ey Md . ni No. 7 
1850 31st ~ "or ai ¥d - No. 16 
Calendar Year: 
1851% 32nd “ Ist “ . 7 ” No. 100 
1852 32nd “* 2nd * ? . - No. 45 
1853 33rd * aa. © . ws No. 78 
1854 Sard “ 2nd “* ° ss ? No. 77 
1855 34th ° Ves ¢ ¢ £ No. 29 
1856 34th “ 3rd “ e $ . No. 78 
1857 35th “* Ist “ - , , No. 62 
1858 35th “ 2nd “* ’ . “ No. 92 
166 Seth “ ‘it “ ° . - No. 32 
1860 36th “ 2nd * ’ ° - No. 81 
1861 37th “ 2nd * . . . No. 111 
1962 37th’ “' Srd “ . . , No. 67 
1863 38th “* Ist “ . - No. 53 
1864 38th “* 2nd * . « . No. 76 
1865 39th “* Ist “ » . . No. 65 


1866 (first three quarters only)— 
39th Congr. 2nd Sess. House Executive Document No. 39 
Year ended Sept. 30: 


1867 40th “ 2nd “ ° : 5: No. 18 
Calendar Year: 

1868-9 41st - See ’ 4 * No. 235 

1870 41st ? 3rd (* , ¥ sa No. 92 


Bureau of Statistics Reports, 1867-1891 


Year ended June 30 
1868 Ist quarter, Monthly Report of Director of Bureau of Statistics No. 11, p. 11 
we “ we “ cs 


2nd . - “; . No. 15, p. 16 
ards * 25 . . . ¢ ” 9 . No. 18, p. 13 
4th “ “ oe ae | No. 21, p. 16 

1869 Ist 7 : $ + . , . 7 $ No. 23, p. 65 
2nd “* - - : e e + * 4 No. 25, p. 142 
3rd * . * . : “ - “a . No. 26, p. 178 
4th: .* Monthly Report of the Deputy Special Commis- 


sioner of the Revenue, in charge of the Bureau 
of Statistics, No. 1, Series 1869-70, p. 19. 


1870 Ist Monthly Report on the Commerce and Navigation of 


the United States, 1869-70 pp. 118-121 
2nd « oe & “ “ be a “ pp. 206—209 
Gal: .* 4 A ape . . ° “ pp. 411-414 
4th 5 ? ”  aetes a “ “ pp. 491-494 
1871 Ist " Monthly Report on the Commerce and Navigation of 
the United States, 1870-71 pp. 86-89 
2nd . " ag: - ' . “ pp. 190-193 
3rd * : ‘. be " ' . “ pp. 292-295 
4th - _ ma aa . - . ’ “ pp. 393-396 
“ “ “ “ i we “ 


Summary for year pp. 396-398 





% For 15 month period, Oct. 1, 1850 to Dec. 31, 1851. 
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lst quarter, Monthly Report on the Commerce and Navigation of 
the United States, 1871-72 

2nd “a oe « 

3rd re ow 7 a 

4th « v7 be vy 

Summary for year a 

lst quarter, Monthly Report on the Commerce and Navigation of 

the United States, 1872-73 

2nd fe we oe 

3rd a oe 

4th ro “ “ 
Summary for year . 


lst quarter, Monthly Report on the Commerce and Navigation of 
the United States, 1873-74 

2nd oe be ae 

3rd oe a 

4th a os “ 

Summary for year oa 


Ist quarter, Monthly Report on the Commerce and Navigation of 
the United States, 1874-75 

2nd oe a ae oe a 

3rd oe “ _ oe 

4th “ rom a a 


Summary for year 


ist quarter, Quarterly Reports of the Chief of the Bureau 
of Statistics, 1875-76 
« 


2nd 
3rd “ a oe os = 
4th “ o a ow “ 
Summary for year * vel 
Ist quarter, Quarterly Reports of the Chief of the Bureau 
of Statistics, 1876-77 
2nd “ “ vs “ a 
8rd “ “ a a 
4th fe oy “ oe 
Summary for year . an 
Ist quarter, Quarterly Reports of the Chief of the Bureau 
of Statistics, 1877-78 
2nd “ ue a “ 
8rd a a oe “ 
4th « by fe & 
« a 


Summary for year 


lst quarter, Quarterly Reports of the Chief of the Bureau 
of Statistics, 1878-79 

2nd a ae “a 7 

8rd “ a a be 

4th oe 7 “ “ a 

“ a 


Summary for year % 


1st quarter, Quarterly Reports of the Chief of the Bureau 
of Statistics, 1879-80 

2nd be “we a “ “ 

3rd i a cs 


4th « a a 


pp. 98-102 


Pp 


Pp. 
pp. 
pp. 


. 238-242 
393-397 
565-570 
571-573 


. 110-115 
. 250-255 
. 366-370 
. 472-478 
- 481-483 


. 111-118 
. 245-252 
. 379-384 
. 495-499 
. 504-507 


. 113-120 
. 247-249 
. 352-358 
. 443-450 
. 452-455 


60-76 


. 216-232 
. 350-365 
. 463-480 
. 507-514 


64-81 


. 204-220 
. 381-395 
- 434-450 
. 547-554 


. 113-127 
. 228-233 
. 348-353 
. 442-447 
. 469-475 


45-48 


. 158-161 
. 264-268 
. 374-378 
. 394-400 


45-49 


. 172-176 
. 272-274 
- 461-462 
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1881 


lst quarter, Quarterly Reports of the Chief of the Bureau 
of Statistics, 1880-81 

2nd . . 

3rd 2 

4th * . 


Summary for year 


lst quarter, Quarterly Reports of the Chief of the Bureau 

of Statistics, 1881-82 

2nd “ “ “ “ 

8rd a “ os “ 

4th “ we “ “ 
“ “ 


Summary for year 


lst quarter, Quarterly Reports of the Chief of the Bureau 
of Statistics, 1882-83 

2nd “ “ “ “ 
ord * ’ 

4th * . 


Summary for year 


1st quarter, Quarterly Reports of the Chief of the Bureau 
of Statistics, 1883-84 
2nd be “ we 
38rd “ “ 
4th “ oe a 
Summary for year 


lst quarter, Quarterly Reports of the Chief of the Bureau 
of Statistics, 1884-85 
be uw a we 


2nd 
8rd “ “ “ « “ “ 
4th “a a “ ee oe 


lst quarter, Quarterly Reports of the Chief of the Bureau 
of Statistics, 1885-86 
2nd oe vs a “ 
38rd we we cs “ oe 
4th “ “ a “ “ 
Summary for year . nities 
lst quarter, Quarterly Reports of the Chief of the Bureau 
of Statistics, 1886-87 
2nd “a cs aw 
3rd “ “ “ 
4th “ “ “ 
Summary for year “s 


lst quarter, Quarterly Reports of the Chief of the Bureau 
of Statistics, 1887-88 

2nd os a“ “ 
ard * . 

4th * . 


Summary for year 
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43-46 


. 178-182 
. 306-310 
. 409-414 
- 464-468 


46-50 


. 178-183 
. 344-348 
. 440-446 
- 496-500 


48-54 


. 206-212 
- 408-412 
. 514-520 
. 568-572 


31-34 


- 165-170 
. 341-344 
. 465-468 
. 526-535 


62-65 


. 312-315 
. 518-522 
. 664-667 


77-79 


. 320-322 
. 516-518 
. 700-702 
. 824-828 


82-84 


. 342-344 
. 616-618 
. 774-776 
. 884-888 


88-90 


. 364-368 
. 552-554 
. 864-866 
- 965-968 
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1889 Ist quarter, Quarterly Reports of the Chief of the Bureau 

of Statistics, 1888-89 . 1, pp. 
2nd“ ” . ~ . 2, pp. 
3rd“ ? . . 3, pp. 
4th * si ° . 4, pp. 
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APPENDIX C: MANIFEST REQUIREMENTS IN IMMIGRATION ACTS 
Act of June 25, 1798 (1 Stat. 570) 


Expired in two years. 

Sec. 3. And be it further enacted, That every master or commander of any ship or vessel 
which shall come into any port of the United States after the first day of July next, shall 
immediately on his arrival make report in writing to the collector or other chief officer of 
the customs of such port, of all aliens, if any, on board his vessel, specifying their names, 
age, the place of nativity, the country from which they shall have come, the nation to 
which they belong and owe allegiance, their occupation and a description of their persons, 
as far as he shall be informed thereof, and on failure, every such master and commander 
shall forfeit and pay three hundred dollars, for the payment whereof on default of such 
master or commander, such vessel shall also be holden, and may be such collector or other 
officer of the customs be detained. And it shall be the duty of such collector or other 
officer of the customs, forthwith to transmit to the office of the department of state true 
copies of all such returns. 


Act of March 2, 1819 (3 Stat. 489) 


Sec. 4. And be it further enacted, That the captain or master of any ship or vessel arriving 
in the United States, or any of the territories thereof, from any foreign place whatever, at 
the same time that he delivers a manifest of the cargo, and, if there be no cargo, then 
at the time of making report or entry of the ship or vessel, pursuant to the existing laws 
of the United States, shall also deliver and report, to the collector of the district in which 
such ship or vessel shall arrive, a list or manifest of ail the passengers taken on board of 
the said ship or vessel at any foreign port or place; in which list or manifest it shall be the 
duty of the said master to designate, particularly, the age, sex, and occupation, of the said 
passengers, respectively, the country to which they severally belong, and that of which it 
is their intention to become inhabitants; and shall further set forth whether any, and what 
number, have died on the voyage; which report and manifest shall be sworn to by the said 
master, in the same manner as is directed by the existing laws of the United States, in rela- 
tion to the manifest of the cargo, and that the refusal or neglect of the master aforesaid, to 
comply with the provisions of this section, shall incur the same penalties, disabilities, and 
forfeitures, as are at present provided for a refusal or neglect to report and deliver a 
manifest of the cargo aforesaid. 
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Sec. 5. And be tt further enacted, That each and every collector of the customs, to whom 
such manifest or list of passengers as aforesaid shall be delivered, shall, quarter yearly, 
return copies thereof to the Secretary of State of the United States, by whom statements 
of the same shall be laid before Congress at each and every session. 


Act of March 3, 1855 (10 Stat. 719) 


Sec. 12. And be it further enacted, That the captain or master of any ship or vessel ar- 
riving in the United States, or any of the Territories thereof, from any foreign place 
whatever, at the same time that he delivers a manifest of the cargo, and if there be no 
cargo, then at the time of making report or entry of the ship or vessel, pursuant to law, 
shall also deliver and report to the collector of the district in which such ship or vessel 
shall arrive, a list or manifest of all the passengers taken on board of the said ship or vessel 
at any foreign port or place; in which list or manifest it shall be the duty of the said master 
to designate particularly the age, sex, and occupation of the said passengers respectively, 
the part of the vessel occupied by each during the voyage, the country to which they 
severally belong, and that of which it is their intention to become inhabitants; and shall 
further set forth whether any and what number have died on the voyage; which list or 
manifest shall be sworn to by the said master, in the same manner as directed by law in 
relation to the manifest of the cargo; and the refusal or neglect of the master aforesaid 
to comply with the provisions of this section, or any part thereof, shall incur the same 
penalties, disabilities, and forfeitures as are provided for a refusal or neglect to report and 
deliver a manifest of the cargo aforesaid. 

Sec. 13. And be it further enacted, That each and every collector of the customs, to whom 
such manifest or list of passengers as aforesaid shall be delivered, shall quarter-yearly 
return copies thereof to the Secretary of State of the United States, by whom statements 
of the same shall be laid before Congress at each and every session. 


Act of May 6, 1882 (22 Stat. 58) 


Sec. 8. That the master of any vessel arriving in the United States from any foreign 
port or place shall, at the same time he delivers a manifest of the cargo, and if there be 
no cargo, then at the time of making a report of the entry of the vessel pursuant to law, 
in addition to the other matter required to be reported, and before landing, or permitting 
to land, any Chinese passengers, deliver and report to the collector of customs of the dis- 
trict in which such vessels shall have arrived a separate list of all Chinese passengers 
taken on board his vessel at any foreign port or place, and all such passengers on board 
the vessel at that time. Such list shall show the names of such passengers (and if accredited 
officers of the Chinese Government travelling on the business of that government, or 
their servants, with a note of such facts), and the names and other particulars, as shown 
by their respective certificates; and such list shall be sworn to by the master in the manner 
required by law in relation to the manifest of the cargo. Any willful refusal or neglect of 
any such master to comply with the provisions of this section shall incur the same penalties 
and forfeiture as are provided for a refusal or neglect to report and deliver a manifest of 
the cargo. (Amended with slight change of wording by Act of July 5, 1884, 23 Stat. 115.) 


Act of August 2, 1882 (22 Stat. 189) 


Sec. 9. That it shall not be lawful for the master of any such steamship or other vessel, 
not in distress, after the arrival of the vessel within any collection district of the United 
States, to allow any person or persons, except a pilot, officer of the customs, or health 
officer, agents of the vessel, and consuls, to come on board of the vessel, or to leave the 
vessel, until the vessel has been taken in charge by an officer of the customs, nor, after 
charge so taken, without leave of such officer, until all the passengers, with their baggage, 
have been duly landed from the vessel; and on the arrival of any such steamship or other 
vessel within any collection district of the United States, the master thereof shall deliver 
to the officer of customs who first comes on board the vessel and makes demand therefor 
a correct list, signed by the master, of all the passengers taken on board the vessel at any 
foreign port or place, specifying separately the names of the cabin passengers, their age, 
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sex, calling, and the country of which they are citizens, and the number of pieces of baggage 
belonging to each passenger, and also the name, age, sex, calling, and native country of 
each emigrant passenger, or passengers other than cabin passengers, and their intended 
destination or location, and the number of pieces of baggage belonging to each passenger, 
and also the location of the compartment or space occupied by each of such passengers 
during the voyage; and if any of such passengers died on the voyage, the said list shall 
specify the name, age, and cause of death of each deceased passenger; and a duplicate of 
the aforesaid list of passengers, verified by the oath of the master, shall, with the manifest 
of the cargo, be delivered by the master to the collector of customs on the entry of the 
vessel. For a violation of either of the provisions of this section, or for permitting or 
neglecting to prevent a violation thereof, the master of the vessel shall be liable to a fine 
not exceeding one thousand dollars. 


Act of September 13, 1888 (25 Stat. 476) 


Sec. 4. That the master of any vessel arriving in the United States from any foreign port 
or place with any Chinese passengers on board shall, when he delivers his manifest of 
cargo, and if there be no cargo, when he makes legal entry of his vessel, and before landing 
or permitting to land any Chinese person (unless a diplomatic or consular officer, or at- 
tendant of such officer), deliver to the collector of customs of the district in which the 
vessel shall have arrived the sealed certificates and letters as aforesaid, and a separate 
list of all Chinese persons taken on board of his vessel at any foreign port or place, and 
of all such persons on board at the time of arrival as aforesaid. Such list shall show the 
names of such persons and other particulars as shown by their open certificates, or other 
evidences required by this Act, and such list shall be sworn to by the master in the manner 
required by law in relation to the manifest of the cargo. 

The master of any vessel as aforesaid shall not permit any Chinese diplomatic or con- 
sular officer or attendant of such officer to land without having first been informed by the 
collector of customs of the official character of such officer or attendant. Any refusal or 


willful neglect of the master of any vessel to comply with the provisions of this section 
shall incur the same penalties and forfeitures as are provided for a refusal or neglect to 
report and deliver a manifest of the cargo. 


Act of March 3, 1891 (26 Stat. 1084) 


Sec. 8. That upon the arrival by water at any place within the United States of any alien 
immigrants it shall be the duty of the commanding officer and the agents of the steam or 
sailing vessel by which they came to report the name, nationality, last residence, and 
destination of every such alien, before any of them are landed, to the proper inspection 
officers... . 

That the Secretary of the Treasury may prescribe rules for inspection along the borders 
of Canada, British Columbia, and Mexico so as not to obstruct or unnecessarily delay, 
impede, or annoy passengers in ordinary travel between said countries. .. . 


Act of March 3, 1893 (27 Stat. 569) 


Be it enacted by the Senate and House of Representatives of the United States of 
America in Congress assembled, That, in addition to conforming to all present require- 
ments of law, upon the arrival of any alien immigrants by water at any port within the 
United States, it shall be the duty of the master or commanding officer of the steamer or 
sailing vessel having said immigrants on board to deliver to the proper inspector of im- 
migration at the port lists or manifests made at the time and place of embarkation of 
such alien immigrants on board such steamer or vessel, which shall, in answer to questions 
at the top of said lists, state as to each immigrant the full name, age, and sex, whether 
married or single; the calling or occupation; whether able to read or write; the nationality; 
the last residence; the seaport for landing in the United States; the final destination, if 
any, beyond the seaport of landing; whether having a ticket through to such final desti- 
nation; whether the immigrant has paid his own passage or whether it has been paid by 
other persons or by any corporation, society, municipality, or government; whether in 
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possession of money, and if so, whether upwards of thirty dollars and how much if thirty 
dollars or less; whether going to join a relative, and if so, what relative and his name and 
address; whether ever before in the United States, and if so, when and where; whether 
ever in prison or almshouse or supported by charity; whether a polygamist; whether under 
contract, express or implied, to perform labor in the United States; and what is the im- 
migrant’s condition of health mentally and physically, and whether deformed or crippled, 
and if so, from what cause. 

Sec. 2. That the immigrant shall be listed in convenient groups and no one list or 
manifest shall contain more than thirty names. 

To each immigrant or head of a family shall be given a ticket on which shall be written 
his name, a number or letter designating the list, and his number on the list, for conven- 
ience of identification on arrival. Each list or manifest shall be verified by the signature 
and the oath or affirmation of the master or commanding officer or of the officer first or 
second below him in command, taken before the United States consul or consular agent 
at the port of departure, before the sailing of said vessel, to the effect that he has made a 
personal examination of each and all of the passengers named therein, and that he has 
caused the surgeon of said vessel sailing therewith to make a physical examination of each 
of said passengers, and that from his personal inspection and the report of said surgeon 
he believes that no one of said passengers is an idiot or insane person, or a pauper or likely 
to become a public charge, or suffering from a loathsome or dangerous contagious disease, 
or a person who has been convicted of a.felony or other infamous crime or misdemeanor 
involving moral turpitude, or a polygamist, or under a contract or agreement, express or 
implied, to perform labor in the United States, and that also, according to the best of his 
knowledge and belief, the information in said list or manifest concerning each of said pas- 
sengers named therein is correct and true. 

Sec. 3. That the surgeon of said vessel sailing therewith shall also sign each of said lists 
or manifests before the departure of said vessel, and make oath or affirmation in like 
manner before said consul or consular agent, stating his professional experience and quali- 
fications as a physician and surgeon, and that he has made a personal examination of 
each of the passengers named therein and that said list or manifest, according to the best 
of his knowledge and belief, is full, correct, and true in all particulars relative to the 
mental and physical condition of said passengers. If no surgeon sails with any vessel 
bringing alien immigrants, the mental and physical examinations and the verifications of 
the lists or manifests may be made by some competent surgeon employed by the owners 
of the vessel. 

Sec. 4. That in the case of the failure of said master or commanding officer of said vessel 
to deliver to the said inspector of immigration lists or manifests, verified as aforesaid, 
containing the information above required as to all alien immigrants on board, there shall 
be paid to the collector of customs at the port of arrival the sum of ten dollars for each 
immigrant qualified to enter the United States concerning whom the above information 
is not contained in any list as aforesaid, or said immigrant shall not be permitted so to 
enter the United States, but shall be returned like other excluded persons. 


Act of March 3, 1903 (32 Stat. 1213) 


Sec. 12. That upon the arrival of any alien by water at any port within the United 
States it shall be the duty of the master or commanding officer of the steamer, sailing or 
other vessel, having said alien on board to deliver to the immigration officers at the port 
of arrival lists or manifests made at the time and place of embarkation of such alien on 
board such steamer or vessel, which shall, in answer to questions at the top of said lists, 
state as to each alien the full name, age, and sex; whether married or single; the calling 
or occupation; whether able to read or write; the nationality; the race; the last residence; 
the seaport for landing in the United States; the final destination, if any, beyond the port 
of landing; whether having a ticket through to such final destination; whether the alien 
has paid his own passage, or whether it has been paid by any other person or by any 
corporation, society, municipality, or government, and if so, by whom; whether in posses- 
sion of fifty dollars, and if less, how much; whether going to join a relative or friend, and 
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if so, what relative or friend and his name and complete address; whether ever before in 
the United States, and if so, when and where; whether ever in prison or almshouse or an 
institution or hospital for the care and treatment of the insane or supported by charity; 
whether a polygamist; whether an anarchist; whether coming by reason of any offer, 
solicitation, promise or agreement, expressed or implied, to perform labor in the United 
States, and what is the alien’s condition of health mental and physical, and whether 
deformed or crippled, and if so, for how long and from what cause. 

Sec. 13. That all aliens arriving by water at the ports of the United States shall be listed 
in convenient groups, and no one list or manifest shall contain more than thirty names. 
To each alien or head of a family shali be given a ticket on which shall be written his name, 
a number cr letter designating the list in which his name, and so forth, is contained, and 
his number on said list, for convenience of identification on arrival. Each list or manifest 
shall be verified by the signature and the oath or affirmation of the master or commanding 
officer or the first or second below him in command, taken before an immigration officer 
at the port of arrival, to the effect that he has caused the surgeon of said vessel sailing 
therewith to make a physical and oral examination of each of said aliens, and that from 
the report of said surgeon and from his own investigation he believes that no one of said 
aliens is an idiot, or insane person, or a pauper, or is likely to become a public charge, 
or is suffering from a loathsome or a dangerous contagious disease, or is a person who 
has been convicted of a felony or other crime or misdemeanor involving moral turpitude, 
or a polygamist, or an anarchist, or under promise or agreement, express or implied, to 
perform labor in the United States, or a prostitute, and that also, according to the best 
of his knowledge and belief, the information in said lists or manifests concerning each of 
said aliens named therein is correct and true in every respect. 

Sec. 14. That the surgeon of said vessel sailing therewith shall also sign each of said 
lists or manifests and make oath or affirmation in like manner before an immigration 
officer at the port of arrival, stating his professional experience and qualifications as a 
physician and surgeon, and that he has made a personal examination of each of the said 
aliens named therein, and that the said list or manifest, according to the best of his knowl- 
edge and belief, is full, correct, and true in all particulars relative to the mental and 
physical condition of said aliens. If no surgeon sails with any vessel bringing aliens the 
mental and physical examinations and the verifications of the lists or manifests shall be 
made by some competent surgeon employed by the owners of the said vessel. 

Sec. 15. That in the case of the failure of the master or commanding officer of any vessel 
to deliver to the said immigration officers lists or manifests of all aliens on board thereof 
as required in sections twelve, thirteen, and fourteen of this Act, he shall pay to the col- 
lector of customs at the port of arrival the sum of ten dollars for each alien concerning 
whom the above information is not contained in any list as aforesaid. 


Act of February 20, 1907 (34 Stat. 898) 


Sec. 12. That upon the arrival of any alien by water at any port within the United 
States it shall be the duty of the master or commanding officer of the steamer, sailing or 
other vessel having said alien on board to deliver to the immigration officers at the port of 
arrival lists or manifests made at the time and place of embarkation of such alien on board 
such steamer or vessel, which shall, in answer to questions at the top of said list, state as 
to each alien the full name, age, and sex; whether married or single; the calling or occupa- 
tion; whether able to read or write; the nationality; the race; the last residence; the name 
and address of the nearest relative in the country from which the alien came; the seaport 
for landing in the United States; the final destination, if any, beyond the port of landing; 
whether having a ticket through to such final destination; whether the alien has paid his 
own passage or whether it has been paid by any other person or by any corporation, 
society, municipality, or government, and if so, by whom; whether in possession of fifty 
dollars, and if less, how much; whether going to join a relative or friend, and if so, what 
relative or friend, and his or her name and complete address; whether ever before in the 
United States, and if so, when and where; whether ever in prison or almshouse or an insti- 
tution or hospital for the care and treatment of the insane or supported by charity; 
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whether a polygamist; whether an anarchist; whether coming by reason of any offer, 
solicitation, promise, or agreement, express or implied, to perform labor in the United 
States, and what is the alien’s condition of health, mental and physical, and whether de- 
formed or crippled, and if so, for how long and from what cause; that it shall further be the 
dvty of the master or commanding officer of every vessel taking alien passengers out of 
the United States, from any port thereof, to file before departure therefrom with the col- 
lector of customs of such port a complete list of all such alien passengers taken on board. 
Such list shall contain the name, age, sex, nationality, residence in the United States, 
occupation, and the time of last arrival of every such alien in the United States, and no 
master of any such vessel shall be granted clearance papers for his vessel until he has de- 
posited such list or lists with the collector of customs at the port of departure and made 
oath that they are full and complete as to the name and other information herein required 
concerning each alien taken on board his vessel; and any neglect or omission to comply 
with the requirements of this section shall be punishable as provided in section fifteen 
of this Act. That the collector of customs with whom any such list has been deposited in 
accordance with the provisions of this section, shall promptly notify the Commissioner- 
General of Immigration that such list has been deposited with him as provided, and shall 
make such further disposition thereof as may be required by regulations to be issued by 
the Commissioner-General cf Immigration with the approval of the Secretary of Commerce 
and Labor: Provided, That in the case of vessels making regular trips to ports of the United 
States the Commissioner-General of Immigration, with the approval of the Secretary of 
Commerce and Labor, may, when expedient, arrange for the delivery of such lists of out- 
going aliens at a later date: Provided further, That it shall be the duty of the master or 
commanding officer of any vessel sailing from ports in the Philippine Islands, Guam, 
Porto Rico, or Hawaii to any port of the United States on the North American Continent 
to deliver to the immigration officers at the port of arrival lists or manifests made at the 
time and place of embarkation, giving names of all aliens on board said vessel. 

Sec. 13. That all aliens arriving by water at the ports of the United States shall be listed 
in convenient groups, and no one list or manifest shall contain more than thirty names. 
To each alien or head of a family shall be given a ticket on which shall be written his 
name, a number or letter designating the list in which his name, and so forth, is contained, 
and his number on said list, for convenience of identification on arrival. Each list or mani- 
fest shall be verified by the signature and the oath or affirmation of the master or com- 
manding officer, or the first or second below him in command, taken before an immigration 
officer at the port of arrival, to the effect that he has caused the surgeon of said vessel 
sailing therewith to make a physical and oral examination of each of said aliens, and that 
from the report of said surgeon and from his own investigation he believes that no one of 
said aliens is an idiot, or imbecile, or a feeble-minded person, or insane person, or a pauper, 
or is likely to become a public charge, or is afflicted with tuberculosis or with a loathsome 
or dangerous contagious disease, or is a person who has been convicted of, or who admits 
having committed a felony or other crime or misdemeanor involving moral turpitude, or is 
a polygamist or one admitting belief in the practice of polygamy, or an anarchist, or under 
promise or agreement, express or implied, to perform labor in the United States, or a 
prostitute, or a woman or girl coming to the United States for the purpose of prestitution, 
or for any other immoral purpose, and that also, according to the best of his knowledge 
and belief, the information in said lists or manifests concerning each of said aliens named 
therein is correct and true in every respect. 

Sec. 14. That the surgeon of said vessel sailing therewith shall also sign each of said lists 
or manifests and make oath or affirmation in like manner before an immigration officer 
at the port of arrival, stating his professional experience and qualifications as a physician 
and surgeon, and that he has made a personal examination of each of the said aliens named 
therein, and that the said list or manifest, according to the best of his knowledge and belief, 
is full, ecrrect, and true in all particulars relative to the mental and physical condition of 
said aliens. If no surgeon sails with any vessel bringing aliens the mental and physical 
examinations and the verifications of the lists or manifests shall be made by some com- 
petent surgeon employed by the owners of the said vessel. 
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Sec. 15. That in the case of the failure of the master or commanding officer of any vessel 
to deliver to the said immigration officers lists or manifests of all aliens on board thereof, 
as required in sections twelve, thirteen, and fourteen of this Act, he shall pay to the col- 
lector of customs at the port of arrival the sum of ten dollars for each alien concerning 
whom the above information is not contained in any list as aforesaid: Provided, That in 
the case of failure without good cause to deliver the list of passengers required by section 
twelve of this Act from the master or commanding officer of every vessel taking alien pas- 
sengers out of the United States, the penalty shall be paid to the collector of customs at 
the port of departure and shall be a fine of ten dollars for each alien not included in said 
list; but in no case shall the aggregate fine exceed one hundred dollars. 


Act of February 5, 1917 (39 Stat. 874) 


See. 12. That upon the arrival of any alien by water at any port within the United 
States on the North American Continent from a foreign port or a port of the Philippine 
Islands, Guam, Puerto Rico, or Hawaii, or at any port of the said insular possessions from 
any foreign port, from a port in the United States on the North American Continent, or 
from a port of another insular possession of the United States, it shall be the duty of the 
master or commanding officer, owners, or consignees of the steamer, sailing, or other 
vessel having said alien on board to deliver to the immigration officers at the port of arrival 
typewritten or printed lists or manifests made at the time and place of embarkation of 
such alien on board such steamer or vessel, which shail, in answer to questions at the top 
of said list, contain full and accurate information as to each alien as follows: Full name, 
age, and sex; whether married or single; calling or occupation; personal description (in- 
cluding height, complexion, color of hair and eyes, and marks of identification); whether 
able to read or write; nationality; country of birth; race; country of last permanent resi- 
dence; name and address of the nearest relative in the country from which the alien came; 
seaport for landing in the United States; final destination, if any, beyond the port of 
landing; whether having a ticket through to such final destination; by whom passage was 
paid; whether in possession of $50, and if less, how much; whether going to join a relative 
or friend, and, if so, what relative or friend, and his or her name and complete address; 
whether ever before in the United States, and if so, when and where; whether ever in prison 
or almshouse or an institution or hospital for the care and treatment of the insane; 
whether ever supported by charity; whether a polygamist; whether an anarchist; whether 
a person who believes in or advocates the overthrow by force or violence of the Govern- 
ment of the United States or of all forms of law, or who disbelieves in or is opposed to 
organized government, or who advocates the assassination of public officials, or who 
advocates or teaches the unlawful destruction of property, or is a member of or affiliated 
with any organization entertaining and teaching disbelief in or opposition to organized 
government, or which teaches the unlawful destruction of property, or who advocates 
or teaches the duty, necessity, or propriety of the unlawful assaulting or killing of any 
officer or officers, either of specific individuals or of officers generally, of the Government 
of the United States or of any other organized government because of his or their official 
character; whether coming by reason of any offer, solicitation, promise, or agreement, 
express or implied, to perform labor in the United States; the alien’s condition of health, 
mental and physical; whether deformed or crippled, and if so, for how long and from what 
cause; whether coming with the intent to return to the country whence such alien comes 
after temporarily engaging in laboring pursuits in the United States; and such other items 
of information as will aid in determining whether any such alien belongs to any of the 
excluded classes enumerated in section three hereof; and such master or commanding 
officer, owners, or consignees shall also furnish information in relation to the sex, age, 
class of travel, and the foreign port of embarkation of arriving passengers who are United 
States citizens. That it shall further be the duty of the master or commanding officer of 
every vessel taking passengers from any port of the United States on the North American 
Continent to a foreign port or a port of the Philippine Islands, Guam, Puerto Rico, or 
Hawaii, or from any port of the said insular possessions to any foreign port, to a port of 
the United States on the North American Continent, or to a port of another insular posses- 
sion of the United States to file with the immigration officials before departure a list which 





IMMIGRATION STATISTICS 1023 


shall contain full and accurate information in relation to the following matters regarding 
all alien passengers, and all citizens of the United States or insular possessions of the 
United States departing with the stated intent to reside permanently in a foreign country, 
taken on board: Name, age, and sex; whether married or single; calling or occupation; 
whether able to read or write; nationality; country of birth; country of which citizen 
or subject; race; last permanent residence in the United States or insular possession there- 
of; if a citizen of the United States or of the insular possessions thereof, whether native 
born or naturalized; if native born, the place and date of birth, or if naturalized the city 
or town in which naturalization has been had; intended future permanent residence; and 
time and port of last «rrival in the United States, or insular possessions thereof; and such 
master or commanding officer shall also furnish information in relation to the sex, age, 
class of travel, and port of debarkation of the United States citizens departing who do 
not intend to reside permanently in a foreign country, and no master of any such vessel 
shall be granted clearance papers for his vessel until he has deposited such list or lists 
with the immigration officials at the port of departure and made oath that they are full 
and complete as to the name and other information herein required concerning each person 
of the classes specified taken on board his vessel; and any neglect or omission to comply 
with the requirements of this section shall be punishable as provided in section fourteen 
of this Act: Provided, That in the case of vessels making regular trips to ports of the United 
States the Commissioner of Immigration and Naturalization, with the approval of the 
Attorney General, may, when expedient, arrange for the delivery of such lists of outgoing 
aliens at a later date: Provided further, That it shall be the duty of immigration officials 
to record the following information regarding every resident alien and citizen leaving the 
United States by way of the Canadian or Mexican borders for permanent residence in a 
foreign country: Name, age, and sex; whether married or single; calling or occupation; 
whether able to read or write; nationality; country of birth; country of which citizen or 
subject; race; last permanent residence in the United States; intended future permanent 
residence; and time and port of last arrival in the United States; and if a United States 
citizen, whether native born or naturalized. (Later amended by the Act of July 30, 1947.) 

Sec. 13. That all aliens arriving by water at the ports of the United States shall be listed 
in convenient groups, the names of those coming from the same locality to be assembled 
so far as practicable, and no one list or manifest shall contain more than thirty names. 
To each alien or head of a family shall be given a ticket on which shall be written his name, 
a number or letter designating the list in which his name, and other items of information 
required by this Act, are contained, and his number on said list, for convenience of identi- 
fication on arrival. Each list or manifest shall be verified by the signature and the oath 
or affirmation of the master or commanding officer, or the first or second below him in 
command, taken before an immigration officer at the port of arrival, to the effect that he 
has caused the surgeon of said vessel sailing therewith to make a physical and mental 
examination of each of said aliens, and that from the report of said surgeon and from his 
own investigation he believes that no one of said aliens is of any of the classes excluded 
from admission into the United States by section three of this Act, and that also according 
to the best of his knowledge and belief the information in said lists or manifests concerning 
each of said aliens named therein is correct and true in every respect. That the surgeon 
of said vessel sailing therewith shall also sign each of said lists or manifests and make 
oath or affirmation in like manner before an immigration officer at the port of arrival, 
stating his professional experience and qualifications as a physician and surgeon, and 
that he has made a personal examination of each of the said aliens named therein and that 
the said list or manifest, according to the best of his knowledge and belief, is full, correct, 
and true in all particulars relative to the mental and physical condition of said aliens. If 
no surgeon sails with any vessel bringing aliens, the mental and physical examinations snd 
the verifications of the lists or manifests shall be made by some competent surgeon 
employed by the owners of the said vessels, and the manifests shall be verified by such 
surgeon before a United States consular officer or other officer authorized to administer 
oaths: Provided, That if any changes in the condition of such aliens occur or develop during 
the voyage of the vessel on which they are traveling, such changes shall be noted on the 
manifest before the verification thereof. 
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Sec. 14. That it shall be unlawful for the master or commanding officer of any vessel 
bringing aliens into or carrying aliens out of the United States to refuse or fail to deliver 
to the immigration officials the accurate and full manifests or statements or information 
regarding all aliens on board or taken on board such vessel required by this Act, and if it 
shall appear to the satisfaction of the Attorney General that there hss been such a refusal 
or failure, or that the lists delivered are not accurate and full, such master or commanding 
officer shall pay to the collector of customs at the port of arrival or departure the sum of 
$10 for each alien concerning whom such accurate and full manifest or statement or in- 
formation is not furnished, or concerning whom the manifest or statement or information 
is not prepared and sworn to as prescribed by this Act. No vessel shall be granted clearance 
pending the determination of the question of the liability to the payment of such fine, or 
while it remains unpaid, nor shall such fine be remitted or refunded: Provided, That clear- 
ance may be granted prior to the determination of such question upon the deposit with the 
collector of customs of a sum sufficient to cover such fine. 


Act of July 30, 1947 (61 Stat. 630) 


(Amended sec. 12 of the Act of February 5, 1917 to read as follows): 

Sec. 12. That upon the arrival of any alien, United States citizen, or national, by water 
at any port within the United States on the North American Continent from « foreign 
port or port of Guam, Puerto Rico, Hawaii, or other insular possession of the United 
States, or at any port of the said insular possessions from any foreign port, from a port in 
the United States on the North American Continent, or from a port of another insular 
possession of the United States, it shall be the duty of the master or commanding officer, 
owners, or consignees of the steamer, sailing, or other vessel, having said alien, United 
States citizen, or national on board to deliver to the immigration officers at the port of 
arrival typewritten or printed lists or manifests made at the time and place of embarkation 
of such alien, United States citizen, or national on board such steamer or vessel, and such 
lists or manifests shall be in such form and contain such information as the Commissioner 
of Immigration and Naturalization, with the approval of the Attorney General, shall by 
regulation prescribe as necessary for the identification of the persons transported and for 
the enforcement of the immigration laws. That it shall further be the duty of the master 
or commanding officer of every vessel taking passengers from any port of the United 
States on the North American Continent to a foreign port or a port of Guam, Puerto Rico, 
Hawaii, or other insular possession of the United States, or from any port of the said 
insular possessions to any foreign port, to a port of the United States on the North 
American Continent, or to a port of another insular possession of the United States to 
file with the immigration officials before departure a list of all aliens, United States citi- 
zens, or nationals, taken on board, said list to be in such form and to contain such infor- 
mation as the Commissioner of Immigration and Naturalization, with the approval of the 
Attorney General, shall by regulation prescribe as necessary for the identification of the 
persons transported and for the enforcement of the immigration laws. No master or 
commanding officer of any such vessel shall be granted clearance papers for his vessel 
until he has deposited such list or lists with the immigration officials at the port of depar- 
ture and made oath that they are full and complete as to the information required to be 
contained therein. Any neglect or omission to comply with the requirements of this section 
shall be punishable as provided in section 14 of this Act: Provided, That in the case of ves- 
sels making regular trips to ports of the United States the Commissioner of Immigration 
and Naturalization, with the approval of the Attorney General, may, when expedient, 
arrange for the delivery of lists of outgoing aliens, United States citizens, or nationals 
at a later date: Provided further, That it shall be the duty of immigration officials to record 
the following information regarding every resident alien and citizen or national leaving 
the United States by way of the Canadian or Mexican borders for permanent residence 
in a foreign country: Names, age, and sex; whether married or single; calling or occupation; 
whether able to read or write; nationality; country of birth; country of which citizen or 
subject; race; last permanent residence in the United States; intended future permanent 
residence; and time and port of last arrival in the United States; and if a United States 
citizen, or national, the facts on which claim to that status is based. 
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Act of June 27, 1952 (66 Stat. 163) 


Sec. 231. (a) Upon the arrival of any person by water or by air at any port within the 
United States from any place outside the United States, it shall be the duty of the master 
or commanding officer, or authorized agent, owner, or consignee of the vessel or aircraft, 
having any such person on board to deliver to the immigration officers at the port of arrival 
typewritten or printed lists or manifests of the persons on board such vessel or aircraft. 
Such lists or manifests shall be prepared at such time, be in such form and shall contain 
such information as the Attorney General shall prescribe by regulation as being necessary 
for the identification of the persons transported and for the enforcement of the immigra- 
tion laws. This subsection shall not require the master or commanding officer, or authorized 
agent, owner, or consignee of a vessel or aircraft to furnish a list or manifest relating (1) 
to an alien crewman or (2) to any other person arriving by air on a trip originating in 
foreign contiguous territory, except (with respect to such arrivals by air) as may be re- 
quired by regulations issued pursuant to section 239. 

(b) It shall be the duty of the master or commanding officer or authorized agent of every 
vessel or aircraft taking passengers on board at any port of the United States, who are 
destined to any place outside the United States, to file with the immigration officers before 
departure from such port a list of all such persons taken on board. Such list shall be in such 
form, contain such information, and be accompanied by such documents, as the Attorney 
General shall prescribe by regulation as necessary for the identification of the persons so 
transported and for the enforcement of the immigration laws. No master or commanding 
officer of any such vessel or aircraft shall be granted clearance papers for his vessel or air- 
craft until he or the authorized agent has deposited such list or lists and accompanying 
documents with the immigration officer at such port and made oath that they are full and 
complete as to the information required to be contained therein, except that in the case 
of vessels or aircraft which the Attorney General determines are making regular trips to 
ports of the United States, the Attorney General may, when expedient, arrange for the 
delivery of lists of outgoing persons at a later date. This subsection shall not require the 
master or commanding officer, or authorized agent, owner, or consignee of a vessel or air- 
craft to furnish a list or manifest relating (1) to an alien crewman or (2) to any other 
person departing by air on a trip originating in the United States who is destined to foreign 
contiguous territory, except (with respect to such departure by air) as may be required 
by regulations issued pursuant to section 239. 

(c) The Attorney General may authorize immigration officers to record the following 
information regarding every resident person leaving the United States by way of the Ca- 
nadian or Mexican borders for permanent residence in a foreign country: Names, age, and 
sex; whether married or single; calling or occupation; whether able to read or write; 
nationality; country of birth; country of which citizen or subject; race; last permanent 
residence in the United States; intended future permanent residence; and time and port of 
last arrival in the United States; and if a United States citizen or national, the facts on 
which claim to that status is based. 

(d) If it shall appear to the satisfaction of the Attorney General that the master or 
commanding officer, owner, or consignee of any vessel or aircraft, or the agent of any 
transportation line, as the case may be, has refused or failed to deliver any list or manifest 
required by subsections (a) or (b), or that the list or manifest delivered is not accurate 
and full, such master or commanding officer, owner, or consignee, or agent, as the case 
may be, shall pay to the collector of customs at the port of arrival or departure the sum 
of $10 for each person concerning whom such accurate and full list or manifest is not furn- 
ished, or concerning whom the manifest or list is not prepared and sworn to as prescribed 
by this section or by regulations issued pursuant thereto. No vessel or aircraft shall be 
granted clearance pending determination of the question of the liability to the payment 
of such penalty, or while it remains unpaid, and no such penalty shall be remitted or re- 
funded, except that clearance may be granted prior to the determination of such question 
upon the deposit with the collector of customs of a bond or undertaking approved by the 
Attorney General or a sum sufficient to cover such penalty. 

(e) The Attorney General is authorized to prescribe the circumstances and conditions 
under which the list or manifest requirements of subsections (a) and (b) may be waived. 
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Blalock, H. M., Jr., Propasiuistic INTERPRETATIONS FOR THE MEAN SQuaRE 
ContTIGcEncy, Vol. 53, No. 281 (March 1958), 102-05. 

The author would like to add the following references which were omitted 
when the article was originally published: 
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I am indebted to George W. Snedecor, Iowa State College, and to D. A. 
Sprott, University of Toronto, for calling attention to an article by E. J. 
Williams [6] and to other references listed below. Williams’ article covers essen- 
tially the same material as that presented under the headings “Method” and 
“Proof” in my paper. In addition to balancing for immediate sequential effects 
when there is an even number of treatments, Williams also gives methods 
whereby, by increasing the number of replications, one can, in certain cases, 
balance for: (a) the effects immediately preceding treatments when there is an 
odd number of treatments, {b) the effects of two preceding treatments and their 
interactions, (c) the main effects of any number of preceding treatments. He 
also takes up the analysis of variance. Methods of counterbalancing for sequen- 
tial effects are further developed in subsequent articles by Williams [7] and by 
Patterson [2, 3, 4]. The second edition of Cochran and Cox [1] gives the analysis 
of variance for Williams’ counterbalanced design as well as for an alternative 
design which makes the estimates of direct and residual effects orthogonal. It 
also gives some squares counterbalanced for immediate sequential effects, but 
without presenting the general method of obtaining them. An application of 
Williams’ design is given in [5]. Further references relevant to cross-over and 
switchback designs are listed in [1]. 
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(6) Williams, E. J., “Experimental designs balanced for the estimation of residual effects 
of treatments,” Australian Journal of Scientific Research, Series A, 2 (1949), 149-68. 

[7] Williams, E. J., “Experimental designs balanced for pairs of residual effects,” Aus- 
tralian Journal of Scientific Research, Series A, 3 (1950), 351-63. 


Goodman, Leo A., and Kruskal, William H., Measures or ASSOCIATION FOR 
Cross Cuassirications, Vol. 49, No. 268 (December 1954), 732-64. 

One of the authors, William H. Kruskal, points out that the corrected for- 
mula which was listed on p. 578 of the December 1957 issue should be further 
corrected to read as follows: 








T = ¥ [2*/y)//V@— D6 — 1. 


He also notes that on p. 615 that the title of the publication by Hiram C. 
Barksdale should read The Use of Survey Research Findings as Legal Evidence 
and that the title of the publication by Preston C. Hammer should read The 
Computing Laboratory in the Univeristy. 

Rosander, A. C., Guterman, H. E., and McKeon, A. J., Toe Usr or Ranpom 
Work Samptine ror Cost ANALysts AND Controt, Vol. 53, No. 282 (June 
1958), 382-97. 

A. W. Reid, Jr. (New Jersey Bell Telephone Company) has pointed out that 

equation (2) appendix on p. 392 should be corrected to read as follows: 


P = 3(1/4 + 1/4 + 2/4 + 2/4 + 2/4) = .40. 


L. J. Savage (University of Chicago) draws our attention to the fact that S. G. 
Soal’s name was misspelled on pp. 618 and 621 of the December 1957 issue. 


I. Richard Savage of the Statistics Department at the University of Minne- 
sota (Minneapolis 14) writes that a revision is being made of “Bibliography of 
Nonparametric Statistics and Related Topics,” Journal of the American Statisti- 
cal Association 48 (1953) pp. 844-906. 

Material through 1959 is to be included with more emphasis, it is hoped, on 
applications than previously. References (particularly to the non-English 
literature), reprints, and technical reports on the theory or applications of non- 
parametric statistics would be greatly appreciated. Also, corrections and addi- 
tions to the original bibliography are desired. 


Waugh, Frederick V. and Fox, Karl A., Grapuic CompuTATION OF Ri.23, 
Vol. 52, No. 280 (December 1957), 479-81. 

Herman Lasken has pointed out that the following two corrections should be 
made: 

The formula on p. 480, line 4, should read 


2 


b= 1--——. 


On page 480, line 13, the word radius should read hypotenuse. 
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Modern Business Statistics. John E. Freund and Frank J. Williams. Englewood Cliffs, 
New Jersey: Prentice-Hall, Incorporated. 1958. Pp. ix, 539. $7.50. 


WituraM I. Greenwa.p, City College of New York 


HE textbook under review joins a sizable stock of competing books destined for 
the same market. Appropriate criteria by which to judge this text include its 
uniqueness, purpose, and audience. 

The title of the book represents an insular conception of statistics. The middle 
word is superfluous and, perhaps, misleading, since the principles and methods of sta- 
tistics are interdisciplinary in nature. The book is meant for the multitude of statis- 
tical novices attending collegiate schools of business. The mathematical equipment 
required as a background for the text is simple and within the customary range of the 
average business student. 

The format of the book is satisfactory and its type face readable. There is a liberal 
use of graphic and tabular illustrations, although the latter, unfortunately, are not 
numbered. The index appears complete. The large number of exercises, with answers 
supplied to the odd-numbered questions, recommend the book to statistics teachers 
who require problems emphasizing the application of statistical techniques. The style 
is lucid and often colloquial, although in some patches the writing is blemished and 
incompletely blended. Most of the subjects are closely reasoned, rigorously treated, 
and completely detailed. 

The standard topics are included in the book, so that much ground is covered. The 
authors accomplished the complete job in fewer pages than is customary in similar 
texts, a tribute to good organization. Approximately one-sixth of the pages are de- 
voted to frequency distributions; two-fifths to probability, sampling, and inference; 
one-sixth to regression and correlation; one-fourth to index numbers and time series; 
and the remainder to an introduction and appendixes. The book has a desirable in- 
tellectual unity and, consequently, does not depend completely upon an instructor. 
Most of the material is a sound elementary discussion and explanation of statistical 
foundations. 

References to deficiencies and defects in the text are not meant as criticisms, but 
merely as illustrations of professional differences and human fallibilities. A few 
topics—time series, for instance—are handled in a manner which is too verbal, 
descriptive, traditional, and réchauffé; the notation is unnecessarily cumbersome and 
even its authors occasionally abandon it (e.g., compare the different symbols used for 
the method of least squares in regression and time series); there are occasional excur- 
sions into other fields of knowledge, accompanied by incorrect or incomplete state- 
ments (e.g., the brief discussion of business cycle theories on p. 453, first paragraph); 
there are incidental misuses and abuses of precise terms and concepts—also ideas 
with which one can differ (e.g., the description of the equation for a demand curve 
on p. 287, first paragraph); there are tautological proofs (e.g., the discussion of least 
squares as a criterion for curve fitting on pp. 288-93) ; a few sections, for example the 
one on statistical inference, are too condensed; some topics are completely omitted 
(e.g., time series correlation); a number of subjects are discussed in a cursory fashion 
(about five pages for multiple and partial correlation) ; and, despite their importance 
and pertinence, certain matters are eschewed (as the economic aspects of index 
numbers). 

The above comments and examples refer to controversial matters in which this 
reviewer does not pretend to be a final authority. Criticisms which are not a matter 
of opinion, as calculation errors, typographical slips, and incorrect formulas, expected 
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in the first edition of a text, have been forwarded to the publisher for an errata sheet. 

What is this reviewer’s overall assessment? 

The book is satisfactory, within its limited ambition, as its creators appear to have 
accomplished the task they set for themselves—to wit—compete with other authors 
at the same level, for the same market, and in the same field. The book was not 
meant to be substantially distinguishable from its potential competitors. 

Those familiar with statistics texts will find it neither a fresh nor unusual experi- 
ence since, frankly, it is imitative. There was no attempt at originality in the text’s 
conception, hence no subject stands out as particularly noteworthy. This is exempli- 
fied by the scarcity of innovations in subject matter, despite the rapid advance and 
diversification of statistical knowledge. This text, mandatory for my students in their 
basic required statistics course at The City College during the 1958 summer session, 
made approximately the same contribution to their statistical education, in my opin- 
ion, as many alternatives which might have been substituted. 

The classroom adoption of this text, therefore, cannot be based upon its net advan- 
tage over others (aside from such considerations as date of publication, change for 
the sake of change, and the nature of the specific educational business program). The 
absence of genuine differentiation in the quality and quantity of this text will be a 
disappointment to those statistics instructors who prefer real freedom of choice in 
selecting textbooks. 


Teach Yourself Statistics. Richard Goodman. London: The English Universities Press, 
Ltd., 1957. Pp. 259. 7/6 net. 


Jutes Josxow, City College of New York 


I was probably inevitable that “do it yourself” would some time expand from house 
painting, plumbing, chicken raising, hi-fi, and the like and move into the field 
of statistics. The title of this book tells us that the time is now. A reading of this ex- 
cellent little work, however, makes this reviewer, at least, suspect that the time is not 
yet quite ripe. 

There is no inconsistency between characterizing the book as excellent, on the 
one hand, and implying failure of the author to live up to the promise of its title, 
on the other. For the book does present a fine condensation of the fundamental mathe- 
matics underlying most of our principal statistical techniques. Within the confines of 
a volume no larger than a “pocket” book the author covers, as his principal topics: 
probability; binomial, Poisson, and normal distributions; bivariate distributions, re- 
gression and correlation; sampling theory, the ¢, 2, and F distributions; analysis of 
variance and chi-square. In doing so, his primary interest is the development of the 
mathematical derivations and inter-relationships among the measures discussed. 

Although the author maintains that “The standard of mathematics assumed is not 
high,” it is apparently high enough to allow him to presume the reader’s familiarity 
with differential and integral calculus and with differential equations. His assumption 
allows him, further, to introduce and develop such concepts as the moment-generating 
function in a scant page, and the gamma and beta functions in less than a page each. 

Each chapter of the book concludes with a set of interesting exercises, many of 
which call for further mathematical manipulation of the statistical devices discussed 
in the chapter. In a large number of instances, obviously out of deference to those 
who actually do try to teach themselves, solutions are also provided. 

Naturally, as would be expected, the book suffers from the restrictions that go hand 
in hand with any condensation. Little attention is given to the problem of testing 
alternative hypotheses. The ramifications of the regression and correlation problem 
are given exceedingly cursory treatment. The significance of proportions is given no 
attention at all. Further, the reader who would like to teach himself something about 
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descriptive techniques for application in economics and business would, from his read- 
ing of this book, never know of the existence of such things as semi-log grids, index 
numbers, seasonal variations, trends, and the like. 

Teach Yourself Statistics provides excellent review reading for those who have at 
some time studied mathematical statistics, and would undoubtedly provide the 
teacher or practitioner with a handy reference volume on the fundamental mathe- 
matics of his profession. It is not likely to, nor should it, replace any of the several 
excellent texts now in general use in introductory courses in the mathematics of 
statistics. It is even less likely to cause any significant unemployment among statistics 
instructors through the development of large masses of statistical “teach yourselfers.” 


Some Aspects of Multivariate Analysis. S. N. Roy. New York: John Wiley & Sons, Inc.; 
Calcutta: Indian Statistical Institute, 1957. Pp. viii, 214. $8.00. 


8. Kuiupack, George Washington University 


HIS monograph primarily deals with samples from multivariate normal popula- 

tions, although there is one chapter which deals with data categorized into con- 
tingency tables. The underlying theme of the monograph is to obtain, for samples of 
a fixed size, and a preassigned level a, or a confidence coefficient 1 —a, (i) a similar- 
region test of a composite null hypothesis which has some reasonably good property 
against the class of relevant alternative hypotheses, (ii) a set of simultaneous con- 
fidence bounds on certain parametric functions that are, in a sense, natural measures 
of deviation from the null hypothesis, appropriate to the problems considered, and 
with some good properties in terms of covering wrong values of the deviations. 

The monograph is admittedly not adequate for the needs of a possible user of sta- 
tistics, but should interest advanced students and research workers. 

The proposed procedures depend on a heuristic class of tests using a union-inter- 
section principle. A type I test for a null hypothesis H» against the class of alternative 
hypotheses H € © has for its region of rejection the union over HEQ of the most power- 
ful critical regions of a test of the null hypothesis H, against the alternative H. The 
region of acceptance of the type I test is the intersection over HEQ of the comple- 
ments of the most powerful critical regions. The level of significance a is the prob- 
ability that the sample fall in the region of rejection under the null hypothesis. A 
type II test, which is defined in a somewhat similar manner, turns out to be the 
likelihood-ratio test. The various hypotheses of intcrest concerning multivariate 
normal populations are transformed to a set of corresponding hypotheses about 
univariate or bivariate normal populations, and the union-intersection principle then 
permits the calculation of a lower bound for the complex power of the multivariate 
test in terms of the simpler powers of univariate or bivariate tests. Similar pro- 
cedures are also used to derive simultaneous confidence intervals. 

The nonparametric generalizations of analysis of variance and multivariate analy- 
sis in chapter 15 consider various hypotheses for two-way and threé-way contingency 
tables that are analogues of those in the analysis of variance and multivariate analysis, 
Chi-square test criteria and results on large-sample distributions are also given. 

Chapters are included discussing the multivariate normal population and the prop- 
erties of samples therefrom, distribution problems, least squares, analysis of variance 
and generalizations, as well as bounds on the power functions, and the confidence 
bounds. 

About 37% of the monograph is devoted to nine useful appendices covering results 
in matrix theory, quadratic forms, transformations, roots of determinantal equations, 
Jacobians, canonical reduction of certain distribution problems, and integration, in 
particular that connected with the distribution of the smallest and largest roots of a 
determinantal equation. 
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The format, typography, editing, exposition, and proof-reading of this monograph 
are all unfortunately poor and tend to distract and annoy the reader. There is no 
index. On page 162 a symbol is defined which will uniformly stand for a certain set 
of variates throughout the monograph, but ‘he earlier uses of this symbol are not 
clarified until one reaches page 162. The ¢x: sessions “well known” and “it is easy to” 
are used to excess and applied over a range from bare statement to results proved in 
detail. On page 97 a result is stated. The same result on page 111 is “well known” 
and includes a reference to an appendix for the proof. Many further results and proofs 
are promised for later monographs. Chapter 15 could better have been published in 
the version which is in Biometrika, Vol. 43 (1956), pp. 361-76. 

Despite its defects, this monograph will find a useful place in the library of the 
advanced student or research worker in mathematical statistics. 


A Guide to Statistical Calculation. Harold E. Yuker. New York: G. P. Putnam’s Sons. 
1958. Pp. 95. $1.95. 


GeraLp J. Lirperman, Stanford University 


HIs book is a manual intended to aid non-mathematically oriented students per- 

form elementary statistical calculations. It deals with techniques of descriptive 
statistics and procedures used in estimation and testing hypotheses. Each technique 
is presented in the form of a problem, and the solution to the problem is then outlined 
in a series of steps. Under descriptive statistics techniques are presented for analyzing 
ungrouped data, frequency distributions, central tendency, dispersion, and trans- 
formations. Under estimation and testing hypotheses, topics in correlation, normal 
tests, t-tests, and chi-square tests are discussed. 

In the words of the author, “this book deals only with the calculation and inter- 
pretation of statistics. There is no discussion of when to use a particular statistic or 
the logic underlying its use.” Unfortunately, this leads to major difficulties, since it is 
not always possible to achieve this goal. Hence, the author makes serious errors. For 
example, in discussing regression equations the manual states: “When a relationship 
exists between two variables, i.e., they are correlated, the relationship can be ex- 
pressed by a straight line called a line of regression or regression line.” Of course, the 
author tacitly assumes that the two variables have a joint bivariate normal distribu- 
tion. 

In a later section, the author introduces the concept of the critical ratio. It is de- 
fined as the ratio between the difference of two statistics and the standard error of 
their difference. This critical ratio is always compared to the percentage points of the 
normal distribution. Again, no underlying assumptions or cautions are presented. 
In fact, it is implied that this theory is applicable to any two statistics, including, for 
example, the difference between two variances. Furthermore, in the subsection on 
interpretation of this critical ratio, the author states that the higher the value ob- 
tained for the critical ratio, the more significant the difference is. 

The manual presents a reasonable treatment of descriptive statistics. However, the 
user is cautioned to be very wary of the chapters on statistical inference. 


Quality Control and Applied Statistics Yearbooks, 1956 and 1957. Robert S. Titchen, 
Arnold J. Rosenthal, Bruce Bollerman, and Frank Nastico, Editors. New York: Interscience 
Publishers, 1956 and 1957. Pp. xvi, 1153, and pp. xvi, 1135. $60 per year. 


Grorce J. Resnixorr, [linois Institute of Technology 
| ae of the two volumes is a collection of reviews of papers published during the 
year given as the volume index. The original papers are from more than three 


hundred journals published throughout the world. Although the title of the series 
implies that the topics covered are diréctly related to quality control and applied 
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statistics, the volumes include some abstracts of articles in business economics, 
process control, operations research, mathematical statistics and probability, and 
others. The inclusion of some papers with a seemingly remote connection with sub- 
jects mentioned in the title may be considered a disadvantage to users interested 
only in quality control and applied statistics. The content of the articles ranges from 
theoretical work using advanced mathematics to simple expository material on the 
application of some well-known statistical procedure to a specific situation. 

It is difficult, with so huge a work, to ascertain how completely the fields of interest 
were covered. Certainly no attempt at completeness was made for the related fields; 
however since the majority of articles deal with quality control and statistical meth- 
ods, the work must contain abstracts of a large proportion of the papers published 
in these fields during the time-periods covered. 

The abstracts for the most part contain a great deal of information about the 
papers from which they were taken. Many of them contain original data, tables, 
graphs, charts, photographs, etc. For many purposes research workers will not, need 
to refer to the original work. Each abstract uses at least the whole of a single page; 
some abstracts are four or more pages in length. A single format is used throughout: 
title, author, journal, purpose, summary, results, and abstracter’s name. In some of 
the abstracts references to related work by other authors are appended. It is regret- 
table that in a reference work of this kind this practice was not followed more ex- 
tensively. The typescript is small, but surprisingly easy to read. I compared several 
of the abstracts with the original papers and, for each of these, found that a serious 
effort had been made by the abstracter to give an informative precis of the original 
work. 

The indexing and arrangement of subject matter of the volumes is open to some 
criticism. The abstracts were originally published, sequentially in time, in loose-leaf 
form. Three methods of filing were made available: by page number (as in the present 
volumes); in some alphabetical arrangement; by subject matter code. The most 
advantageous of these methods is the third, since one has available as a unit all ab- 
stracts dealing with a specific technique or situation. In the bound version no order is 
discernible; an abstract on process control may be sandwiched between one on sample 
survey methods and another on mathematical models for a science library. There is 
an author index and a subject index. However, due to the scattering of related sub- 
jects, a great deal of turning of the more than eleven hundred pages in each volume 
may be required. 

Despite these disadvantages and the price of sixty dollars per volume, this series 
should prove a welcome addition to the reference shelves of technical libraries. It is 
an ambitious task being carried out very well. 


Sampling in Sweden: Contributions to the Methods and Theories of Sample Survey Prac- 
tice. Tore Dalenius. Almquist and Wicksell, Stockholm, Sweden, 1957. Pp. vii and 247. 


Wiuuram G. Mapow, Stanford Research Institute 


= a preliminary chapter giving a synopsis of sample survey methods and 
theories, the following five chapters deal with aspects of sample survey practice 
in Sweden. Lest it appear that these chapters detract from the value of the book 
for the non-Swedish reader I should like to stress that the discussion will be found to 
be of interest to almost ali. In discussing Swedish experience the author introduces 
in connection with events that have actually occurred the important practical prob- 
lems that occur in survey practice and organizations and shows the relation of theo- 
retical work to the solution of these problems. 
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Chapters 7 through 11 discuss technical topics in sample surveys, emphasizing 
mainly those in which the author has worked. 

The book concludes with a chapter “Prospects of Swedish Sample Survey Prac- 
tice.” 

The book does not emphasize proofs but the technical language used in chapters 
7 through 10 will probably be too difficult for the reader who has not previously 
studied some sampling theory or mathematical statistics. 

Readers who have not previously seen Dalenius’ own research in optimum stratifi- 
cation, the estimation of several parameters from a single sample survey, and the 
use of linear programming techniques in sample survey design will find this book of 
great assistance. 

Almost any statistician concerned with surveys will welcome this book for at least 
one of the following reasons: 

(a) The study of sample surveys in another country than his own. 

(b) The summarization of research previously available only in journal articles. 

(c) The worked out examples and illustrations. 

The only objection this reviewer has is a minor point in chapter 1. Surveys have 
many uses other than description and estimation. 


The Use of Survey Research Findings as Legal Evidence. Hiram C. Barksdale. Pleasant- 
ville, New York, Printer’s Ink Books, a Division of Printer’s Ink Publication Corp. 1957. 
Pp. xxvi, 161. $6.00. 


Watrer J. Buenxo, Pittsburgh, Pa. 


fee publisher’s jacket states that the thrust of the book is to show “how opinion 
surveys must be conducted to be valid for court evidence”; that it is a “hand- 
book” for survey technicians and a “reference book” for practicing attorneys. The 
book does not fulfill the publisher’s promise in these particulars. It is more of an 
essay that a treatise. It does have value in that it gives interesting historical account 
of the use of survey findings in litigation, and shows the wide variety of legal ques- 
tions upon which survey evidence has been brought to bear. 

As to survey research procedures: The discussion is elementary and, indeed, is 
stated (p. 18) to be only “an introductory description of survey research methods for 
those readers who may not have training or experience in the research field.” It 
offers no discussion whatsoever of statistical methods but merely informs the reader 
that “Through the use of statistical methods and logical reasoning,” the relationships 
among the variables dealt with in a survey “are explored and the meaning of these 
relationships in terms of the problem under study are established.” 

As to the use of survey findings in litigation: Most of the book is addressed to this 
branch of the author’s inquiry. It consists mainly of a discussion of cases, pretty 
much in chronological order, in which survey findings were admitted in evidence or 
were rejected. Little or no attempt is made to state the rationale of the decisions, and 
in many instances the author contents himself with the mere statement that survey 
evidence was, or was not, admitted in a cited case. 

Although the author recognizes (p. 106) that the admissibility of survey findings 
is likely to hinge upon whether they are “properly presented and identified” he as- 
sumes elsewhere (p. 45) that admissibility of survey findings is a question separate 
and apart from their weight or creditability after being admitted. It is more likely, 
however, that in a great majority of cases the court will have considered the two 
questions inter-relatedly and has admitted (or rejected) survey findings as evidence 
only after a showing of the purpose for which they are offered and a sufficient showing 
of the character and conduct of the survey to indicate that the findings would qualify, 
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at least prima facie, as evidence for the stated purpose. It was probably impossible 
for the author to examine the records of very many of the cited cases, and unfor- 
tunately the crucial turning point of rulings on evidence are not always clearly 
reflected in judicial opinions. The net result is that the crucial chapter of the book— 
“Legal Requirements”—-does not go very far in particularizing them. 


Health Statistics, from the U. S. National Health Survey. Preliminary report on volume 
of Physician Visits, United States, July-September, 1957. U. S. Department of Health, 
Education and Welfare, 1958. Superintendent of Documents, U. 8. Government Printing 
Office. Pp. 25. $0.25. 


Mark Biumsere, Stanford Research Institute 


HIs slim report is a welcome harbinger of the valuable data expected by the U. S. 

National Health Survey. In it are discussed the frequency of physicians’ visits 
and the distribution of physicians’ visits by place of visit, by type of service, and by 
time interval since last visit. In the 20 detailed tables these distributions are pre- 
sented by age, sex, and residence. 

The first of the two appendixes discusses briefly the survey design and related 
matters. Very little information concerning sampling errors is presented but more 
is promised for later reports. More detailed descriptions of the survey are available 
(Science, vol. 127, May 30, 1958, pp. 1275-79; and Estadistica, vol. 15, June 1957, pp. 
428-31). 

The second of the two appendixes contains the definitions of terms that are needed 
to make the report understandable. 


Tuberculosis in White and Negro Children 
Volume I. The Roentgenologic Aspects of the Harriet Lane Study. Janet B. Hardy. 
Pp. vii, 122. $7.50. 
Volume II. The Epidemiologic Aspects of the Harriet Lane Study. Miriam E. Brailey. 
Pp. 103. $4.50. 

Published for the Commonwealth Fund by Harvard University Press, 1958. 


Liza Etvesack, Tulane University 


oLuME I is completely clinical in its orientation and will not be reviewed here. 

Volume II presents the results of a follow-up study of 1329 children who were 
infected with tuberculosis and of whom none had the benefit of antimicrobial therapy. 
Despite the radice| alteration of the prognosis of tuberculous infection in children in 
those countries where such medication is available, this study presents results of 
general interest. 

The expository material is detailed and well presented. A wealth of summary ma- 
terial is given concerning other studies. Case histories are given, where required, to 
explain the composi’ ‘on of sub-groups studied. 

Section I is entitled “The Prognosis of Tuberculous Infection in Children.” The 
study group was drawn from the Harriet Lane Tuberculosis Clinic, an out-patient 
clinic of the Johns Hopkins Hospital for patients under two years of age. The group 
includes children found to be infected with tuberculosis and their infected older 
siblings under fifteen years of age. The ages at which initial infection took place are 
not known. Cases were admitted to the study from 1928 to 1944 and follow-up was 
continued until 1950. There were 437 white and 892 Negro children, all from families 
unable to pay for medical care. The great majority of the children received only out- 
patient supervision. The selected nature of the study material is clearly recognized 
and carefully discussed. The follow-up was quite successful, and very few of the 
children were lost to observation during the course of the study. 

At first examination miliary tuberculosis and parenchymal involvement were more 
frequently found in Negro children than in white, and lesions were more often 
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demonstrable and more serious in the younger age group (less than 3 years of age as 
opposed to 3 to 15 years). 

The author uses life table methods in the analysis of mortality and the oecurrence 
of reinfection (or adult type) pulmonary tuberculosis. For some reason, which is not 
stated, in the study of extrapulmonary tuberculosis no account is taken of deaths 
and withdrawals during the nine years studied. The conclusion that the attack rate 
of extrapulmonary disease is higher in younger children and in those with serious 
pulmonary involvement at first examination seems acceptable on the basis of the 
information given. 

The exposition of the statistical methods used is minimal. The life table methods 
are first approximations which, in view of the low death and withdrawal rates experi- 
enced, are adequate. In tests of significance involving the resulting survivorship 
percentages, simple binomial variances have been used with sample size taken as 
“the equivalent sample size in which there are no withdrawals but which yields the 
same survival rate and the same number of deaths as were observed.” It seems 
doubtful that any of the claims of significance would be invalidated by use of a more 
precise estimate of the variances. 

Section IJ is entitled “The Risk of Development of Reinfection Pulmonary Tu- 
berculosis.” In a group of 434 infected white children observed for 5930 person-years 
only two cases developed during observation. For the Negro group of 858 the rate 
was much higher, particularly among older females. These differences in rates are 
in general agreement with results of other studies. 

A group of Negro children, all related by blood to a fatal or sputum positive case 
of pulmonary tuberculosis and not ill at the time the infection was discovered, was 
followed for ten years to study the effect of household exposure to a sputum positive 
ease. Of the 428 children, 241 were known to have been so exposed at some time 
subsequent to the discovery of their own infection and the remaining 187 served as 
controls. The groups were quite well matched with respect to sex, age and initial 
diagnosis. Neither duration and timing of the exposure nor differences in other en- 
vironmental factors were controlled. Prior death is taken as the competitive risk and 
the crude or mixed probability of development of reinfection tuberculosis is found 
to be higher in the group of children known to have been exposed to a sputum positive 
ease. The author states “The difference in the two rates is barely twice its standard 
deviation and in a statistical sense ought not to be regarded as definitely significant. 
The writer thinks it represents a real difference, but she can appreciate the reluc- 
tance of readers who may view the evidence as inadequate for a definite conclusion 
as to the role of superinfection.” This reader’s reluctance arises from epidemiologic 
rather than statistical considerations and would not be altered by possible refine- 
ments of the significance test used. 

On the whole, although the statistical methods used are elementary, the complete 
and careful account of the results of the follow-up study and the evaluation of the 
limitations of the data make this volume a valuable contribution to the study of 
prognosis of tuberculous infection in children for whom antimicrobial therapy is 
not available. 


America’s Children. Eleanor H. Bernert. For the Social Science Research Council in co- 
operation with the U. S. Department of Commerce, Bureau of the Census. New York: 
John Wiley & Sons, Inc.; London: Chapman and Hall, Ltd., 1958. Pp. xiv, 185. $6.00. 


Rauru W. Tyuer, Center for Advanced Study in the Behavioral Sciences 


Ts is one of the volumes which organizes and presents data obtained from the 
Census of 1950. The author’s purpose is “to provide up-to-date knowledge of the 
demographic characteristics of young people, the interrelations of these character- 
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istics, and their probable future trends.” Most of the tables deal with five age groups: 
0-4 years, comprising the preschool ages; 5-9 years, representing early school ages; 
10-14 years and 15-19 years, representing the middle and later school ages, re- 
spectively; 20-24 years, constituting those at the threshold of their working lives. 

In outlining the size and distribution of each age group, changes in the periods 
since 1900 are shown. The high proportions of children in the south as compared 
to the north, in rural areas as compared to urban areas, and in the non-white pepula- 
tion as compared to the white population are pointed out. Chapter 3 is devoted to 
examining the correlates of high childhood dependency, that is, the characteristics 
associated with areas and groups in which the per cent of children is high. These 
include low income, little urbanization, poor housing, low expenditures for school- 
ing, and high retardation in the children’s progress through school. Chapter 4 deals 
with living and family arrangements. Ninety-one per cent of the children under 18 
years of age were living with one or both parents, about 5 per cent with one or both 
grandparents, about 2 per cent with other relatives, and 2 per cent or 800,000 children 
were not living with relatives but as foster children, institutional residents, and the 
like. About 2.5 million children under 18 years of age were living in homes broken 
by widowhood and divorce and another 1.5 million children were living in homes 
broken by the parents’ separating. The proportion of broken homes is higher in 
urban than in rural areas. 

Chapters 5 and 6 analyze school enrollments, educational attainments and varia- 
tions in pupil progress through the schools. The great increases in proportions of 
children in school since the Census of 1900 are analyzed. The lower school enroll- 
ments among non-white than white, among females than males, in the south as 
compared to the west, and in the urban areas as compared to the rural farm areas 
are indicated and discussed. The number of years of schooling and the rate of progress 
through school follow much the same pattern as school enrollments. 

Chapters 7 and 8 analyze youth employment. Although the porportion of youth 
in school has increased steadily since 1900 and the proportion engaged in full-time 
work has steadily diminished, the number of youth engaged at least part-time in the 
labor force was in 1950 higher than at any previous time. The author examines a 
number of correlates of youth employment, and develops a net-association method 
for comparisons. This method attempts to measure “the association of any given 
characteristic with labor force participation while holding constant the effects of 
another specific characteristic.” This is not wholly sound logically and the resulting 
measures do not contribute much information to the total report. 

This collection of census data and the analyses are very useful to people in educa- 
tion, and in the child welfare services. 


Statistik der Beschaeftigten und Arbeitslosen in der Bundesrepublik Deutschland. 
Schriftenreihe des Bundesarbeitsministeriums: Heft 3. Th. Galland. Stuttgart: W. Kohl- 
hammer Verlag, 1956. Pp. 416. 


JoHANNA Stern, National Bureau of Economic Research 


HE 8th International Conference of Labor Statisticians held at Geneva late in 

1954, recommended through its Committee on Employment and Unempioyment 
Statistics that each of the participating countries prepare a survey of its own methods 
and techniques used currently or planned for future use in compiling such statistics. 
The material thus prepared was to be published as a manual of labor statistics by 
the International Labor Office.! This book, “Statistics of the Employed and the 
Unemployed in the Federal Republic of Germany,” represents the German contribu- 





} International Labour Review, Vol. LX XI, p. 296. 
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tion to the projected manual. The author states that he followed closely the Con- 
ference resolution—partly because of the lack of an exactly defined concept of “labor” 
statistics—and restricted his survey to the statistics of employment and unemploy- 
ment proper. However, the topical limitation he set himself did not prevent him from 
stepping across the geographical boundaries implied in the title of his book. Far from 
discussing merely the statistical material originating in the Bundesrepublik he has 
assembled a comprehensive volume of information on many aspects of employment 
and unemployment in Germany prior to World War II as well as in the present 
Republic. 

The book comprises three major parts each of which is followed by an appendix. 
The first part deals with the work of the Bundesanstalt fuer Arbeitsvermittlung und 
Arbeitslosenversicherung (Federal Office of Labor Exchanges and Unemployment 
Insurance). The groundwork for the rather recent statistical activities of this agency 
had been laid during the period of planned economy of the totalitarian regime in the 
nineteen-thirties, when all those who were occupied at all, including self-employed 
persons as well as unpaid family workers, had to register with a local labor exchange 
and also were forced to carry a labor identification card. This wholesale registration 
produced a card-index of the entire working population on the basis of which several 
complete employment censuses were conducted between 1938 and 1945. While there 
is now no compulsory registration for the worker, the employer is obliged by law to 
report the turnover of labor in his own establishment and this makes it possible for 
the labor exchanges to keep the records up to date. 

The Bundesanstalt is also responsible for the compilation of monthly unemploy- 
ment statistics similarly based on a system of index cards which reflect the number 
of persons registered with the labor exchanges and which contain the basic data 
needed for the various tabulations. 

The second part covers the work of the Federal Statistical Office insofar as it is 
concerned with labor statistics. After a brief outline of the occupational censuses 
taken prior to World War I and starting in 1882, there follows an analysis of the 
series of censuses for 1925 and later years all of which were full-scale occupational as 
well as population and business censuses. Enumeration of the unemployed had also 
been made the task of the census as early as 1895. The author believes, however, 
that census questionnaires could not yield trustworthy data prior to 1950. 

The third part is devoted to a discussion of employment data which, contrary to 
the census surveys or the tabulation of the Bundesanstalt, make no claim to com- 
plete coverage. There are, first, the “by-products” resulting from three partial post 
World War II censuses: one each of non-agricultural and agricultural establish- 
ments, and a census of artisans and small trade establishments. 

A second group of data concerns individual economic sectors. While limited in 
scope they achieve at worst representative, and at best complete coverage for these 
subdivisions. The oldest statistics in this group is the Industrieberichterstattung, a 
monthly report on production, employment, wages, and hours in the manufacturing 
industries, initiated in 1912 by the then Imperial Statistical Office and perhaps 
roughly comparable to the BLS series on employment, hours, and earnings in manu- 
facturing. Similar compilations are available for certain nonmanufacturing indus- 
tries. 

The last group of data discussed in part three consists of certain “substiiutes” for 
employment statistics: monthly series of membership in the health insurance sys- 
tem; annual (recently semi-annual) tabulations of membership in the compulsory 
accident insurance system, scmewhat similar to our own workmen’s compensation 
plans; and annual (at present biennial) surveys made by the industry inspectors 
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of all but the smallest establishments. The criteria for inspection include a minimum 
number of employees or simply the use of motor-driven tools. 

This book seems merely a chronological list of sources of unrelated statistical data 
due to the author’s arrangement of his material by compilers rather than by topics. 
Such an arrangement, he believes, prevents duplication and avoids cross references, 
maintaining a fair balance between chapters which otherwise would have been 
lengthy for employment and brief for unemployment information. Since the third 
part is devoted entirely to employment statistics it is difficult to follow this reason- 
ing. It is to be regretted that the author seems unacquainted with earlier books of 
this kind such as a volume on “Scope and Methods of Official Labor Statistics in 
Major Industrial Countries.”* It is divided into chapters such as Labor Markets, 
Labor Exchanges, Unemployment, etc. A dozen years after the publication of the 
German source book, Laurence F. Schmeckebic: in his classic study of government 
statistics adopted “the general arrangement of the material . . . by topics” because 
the book’s purpose was “to indicate what statistics are available and where they may 
be found.”® It is a great pity that Galland failed to follow his first impulse to present 
his material similarly. In that case he would have been able to find a proper place 
for the trade union statistics of unemployment which he dismisses in a footnote as 
“unofficial.” These monthly data were published for many years by the Reichs- 
arbeitsblatt, the official German labor gazette, beginning with 1903; for earlier years 
they had been compiled quarterly, Although “partial” and “substitute” series, their 
classification by industries doubtlessly gives them importance in the analysis of 
contemporary conditions, at least until the Bundesanstalt began its own compilations 
around 1928. The reader might have benefited from being informed of their avail- 
ability. A topical presentation could also have saved the author from relegating some 
interesting material to appendixes. One of these discusses a number of surveys during 
the years 1953-55 concerning special problems of unemployment. Another appendix 
describes the almost completed plans of the Federal Statistical Office to establish 
monthly labor force reports in collaboration with the statistical offices of the Laender.* 
The idea for this “Mikrozensus” originated during the discussions in 1949 of the 
Manpower-Commission of the organization for European Economic Cooperation, 
when it was found impossible to compare the potential employment estimates of 
the member countries. Several European countries besides Germany have now begun 
to conduct such surveys annually, In view of the trend to economic unification of the 
European countries it is of first-rate importance that labor force estimates will soon 
become available internationally, all of which will have been compiled according to 
uniform principles. Besides discussing the sampling techniques to be used in ob- 
taining the planned monthly reports the author examines the difficulties of estab- 
lishing a schedule that will permit comparison of the “Mikrozensus” data with the 
international labor force data and with the results of German pepulation and occu- 
pational censuses. Comparability in the latter case is of particular importance, be- 
eause future censuses will furnish the benchmark figures for the inflation of the 
sample data to the universe level. 

The author has provided a good deal of incidental information to aid the reader 
in understanding the development of some of these statistics such as discussion of 
economic conditions that caused certain surveys to be made; changes in theoretical 





2 Gebiete und Methoden der Amtlichen Arbeitsstatistik in den Wichtigsten Industriestaaten. Beitraege sur 
Arbeiterstatistik No. 12. Bearbeitet im Kaiserlichen Statistischen Amte, Abt. fuer Arbeiterstatistik, Berlin: 1913. 

5 The Statistical Work of the National Government. The Institute for Government Research, Studies in Admin- 
istration. The Johns Hopkins Press, Baltimore, Md., 1925, p. 2. 

4 A separate sample survey of employment in agriculture and forestry along similar lines is described in the 
third appendix. 
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concepts which contributed to establishing new industrial and occupational classifi- 
cations. A staggering amount of technical detail is given in each section of the book, 
but the emphasis on administrative procedures appears to be exaggerated. It is 
doubtful whether the statistician in general needs to know each law that was enacted 
for the purpose of a census or some other official statistical compilation; nor does it 
seem important to mention which entries in certain questionnaires are made in 
pencil, and why this is done. The description of manual techniques of tallying and 
coding raw data must come as a shock to readers who are living in the age of elec- 
tronic computers. Nevertheless the very completeness of these descriptions will be of 
the greatest value for users of the German statistics who do not have at their dis- 
posal libraries containing the hundreds of volumes of the Statistik des Deutschen 
Reiches and related source material on which those volumes are based. This litera- 
ture is extensively quoted in the text. At the end of each section the author has 
listed the sources proper for each type of statistics discussed. One wishes that the 
alphabetical index were more generous. 


Scientific Programming in Business and Industry. Andrew Vazsonyi. New York: John 
Wiley & Sons, Inc., 1958. Pp. xix, 474. $13.50. 


Rosert L. Graves, University of Chicago 


ow does one teach linear programming and several related topics to non- 
mathematicians? This book contains one answer. This answer is to present 
business problems in such a manner that writing the relevant equations is easy and 
then to work out numerical examples in great detail, bringing in definitions and 
theorems as necessary. I suspect that few readers will agree with the author’s state- 
ment that “This is not a book on mathematics; it is a book on business manage- 


ment.” 

The selection and organization of the topics is excellent, both for the management 
man and for the mathematician whose help will be required on more than a few 
occasions. The method of presentation will at times annoy the mathematician. 

Teaching mathematics in the manner of this book leads to an uneven exposition. 
For example, on page 85, one is told that f(z) is to be read “ef of ex.” Then on the 
next page a convex function is discussed, the area between curves is mentioned, and 
some rather involved formulas are displayed. 

To me it would have been much more satisfactory had the author said, “Here are 
some references which give the mathematical background the reader should have.” 
If it was felt that suitable references don’t exist, then each chapter could have been 
given an introductory section. Probably only about 20% of the book can be read 
by persons without some degree of mathematical maturity. 

Now let us turn to the organization of the book itself. The first chapter discusses 
the notion of a mathematical model of a business situation. It consists largely of 
reviews of various articles on operations research which have appeared in the past 
five years. Chapter 2 discusses the transportation problem in linear programming. 
A number of variants of the same problem are discussed; the non-mathematician 
should feel quite at home with this chapter. In the third chapter the equations for a 
number of different simple models are set up. It is really an introductory chapter ex- 
cept that a certain amount of formalism is required and chapter 2 has prepared the 
reader to accept this formalism. References are made to the subsequent chapters 
which deal with methods of solution. 

Chapters 4 through 9 constitute part 2 of the book which is called “Mathematical 
Programming.” Here the Simplex Method, Dynamic Programming, and the Ele- 
ments of Game Theory are discussed. The aim is to give an understanding of these 
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three topics, not necessarily to discuss the most efficient computational methods. 
However, a version of this simplex calculation method is carried through in great 
detail. The reader who has seen other discussions of the method may be somewhat 
confused by the author’s schematic arrangement of the various arrays. The dual 
theorem of linear programming is discussed and used to treat the generalized trans- 
portation problem. There is a chapter on the geometry of linear programming which 
includes some very useful three-dimensional representations. Convex programming 
is discussed but the bounded variable algorithm is not treated. It would have been 
very simple to include the slight modifications in the regular computational algorithm 
which would enable the reader to use this method without going to original sources. 

The third section of the book talks about Statistical Inventory Control, Assembly 
Line Scheduling, Machine Shop Scheduling, and a Model of Production Scheduling. 
This is an interesting section and parts of it draw on the author’s own work. In the 
chapter on Statistical Inventory Control it is asserted that no previous knowledge 
of statistics or probability is required. As various specific problems are discussed 
definitions are introduced in context and the normal and Poisson distributions are 
described and used. One is shown how to maximize expected profit when direct costs, 
run-out costs, ordering costs, and so on, are known. Both empirical distributions and 
the normal distribution are used. Here again it would have been better either to 
refer to a book in which the probability fundamentals could be found or to devote a 
separate section of the chapter to developing these fundamentals rather than doing 
it in context. 

The chapters on Assembly Line Scheduling and Production Scheduling are very 
well done, although, as the author says, the state of the art here is much less ad- 
vanced than in the topics treated in the other portions of the book. Assembly lines 
can be treated only in very simple cases. The work on production scheduling consists 
largely of developing a formalism to handle very large matrices of a special form. 
The functional notation used to describe Assembly Line Scheduling (including 
changing the rates of production) is handled very nicely. 

In conclusion, let me repeat only that it would have been preferable to precede 
each chapter with a section labeled “Mathematical Preliminaries.” Also it would 
have been preferable to have a more systematic bibliography rather than to rely 
on the rather sketchy foot-note presentation used. This book does seem to be better 
organized and the material better presented than is the case in several other books 
on Mathematical Programming which have appeared recently and possibly the 
author really has found the way to present mathematical material to an audience 
with a very slight mathematical background. There appear to be very few misprints 
and the typography and illustrations are excellent. 


Strategic Intelligence Production. Washington Platt. New York: Frederick A. Praeger, 
Inc., 1958. Pp. xviii, 302. $4.00. 


A. W. MarsHauy, The RAND Corporation 


ats book is written at a very abstract but elementary level concerning the prob- 

lem of intelligence production. A large part of the discussion of intelligence pro- 
duction is carried on in terms of the application of principles similar to those which 
are reputed to be helpful in the analysis and planning of military operations. In- 
telligence production is defined for purposes of the book as the construction, from 
isolated pieces of information collected by intelligence agencies, of finished intelli- 
gence reports. Strategic intelligence is roughly the kind of intelligence used at the 
highest level of government in deciding among alternative courses of overall policy. 
As such, it encompasses estimates of current and—more especially—future strengths 
and weaknesses of enemy countries. The treatment of the problem of constructing 
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such estimates in the book is such that only someone already acquainted with in- 
telligence problems would find most of the discussion at all rewarding. Undoubtedly 
this is in large part unavoidable, since specific intelligence problems cannot be 
discussed too openly. 

This book has a brief section on statistics and stresses the importance of statistical 
thinking as contrasted with statistical competence for manipulation of data on the 
part of intelligence analysts. However, this section is quite limited. The importance 
of the intelligence analyst understanding some of the statistical characteristics of his 
data when preparing finished intelligence estimates of a specified type is the thing 
that is stressed. However, the most important implication of the uncertain nature 
of the inputs into the intelligence production process seems to be missed. While the 
author admits that there must necessarily be a great deal of uncertainty in intelli- 
gence estimates, especially projections well into the future, he does not seem to draw 
the conclusions that a statistician would draw from this statement. For example, 
given the division of function between intelligence producers and the decision makers, 
it would seem to be especially important that a good description of the relevant 
alternative possibilities and their likelihoods be passed on to the decision makers. 
However, evidently because of pressures both from the decision makes and within 
the intelligence community, the author makes it clear that a great deal of emphasis 
is in fact put on making “useful” estimates of the future which artificially reduce the 
apparent uncertainties involved. The author comes down very heavily on the side 
of preparing “useful” estimates which are estimates of the single most probable event 
or course of events. In some cases if the most likely event or course of events is not 
thought to be nearly certain, the prediction the author suggests should be qualified 
by statements indicating the degree of confidence the analyst has in various parts 
of the estimate. It would seem to the reviewer that a more useful estimate would be 
one which spelled out the other main alternative projections and any material use- 
ful in gauging their relative likelihoods. In many cases it seems unlikely that the 
purported most probable event in fact could be accorded a probability of greater 
than one-half, and the consequences of acting as if it were certain are likely to be 
very disagreeable. 

This is not a very good book, in my judgment, and certainly not a book that will 
have much interest to statisticians—-even though the general area seems to be po- 
tentially one of considerable interest. The general kinds of thinking which have been 
developing in modern statistics could probably be very useful in guiding the produc- 
tion of strategic intelligence. 


Bibliography on Income and Wealth, Volume VI, 1953-54. Phyllis Deane, Editor. Inter- 
national Association for Research in Income and Wealth. London: Bowes and Bowes. 
1958. Pp. 139. 37s 6d. 


Junius MarGouis, University of California, Berkeley 


Hs is the sixth in a continuing series of bibliographical volumes embracing the 

fields of national income and wealth, social accounting and its uses in national 
budgeting or model building, international comparisons, input-output studies, labor 
force, size distribution and related topics. The literature covered includes the pres- 
entation of data, concepts, methodology and studies using the material by both 
government agencies and economic analysts. 1937 was the first year covered by the 
initial volume in the series. 

Correspondents from 23 countries, the United Nations and Pan-American Union 
report on the literature published in the journals, government reports, books and 
business literature of their countries. Each reference has a succinct and informative 
abstract in English except for the French which retain their original. All titles except 
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French and Spanish are translated into English. The references are intelligently 
classified and cross-indexed. Each reference has a detailed statement of source, 
usually including publisher’s address and price. This is especially helpful in the case 
of government reports. 

For most countries and topics the volume will prove invaluable as an index of the 
literature. Unfortunately some countries are neglected. There are no references to 
China and the Russian and Polish references are to items published outside of those 
countries. 


Corporate Bond Quality and Investor Experience. W. Braddock Hickman. Princeton Uni- 
versity Press, 1958. Pp. xxix, 536. $10.00. 


Ina O. Scorr, Columbia University 


y ine is the second of three studies in corporate bond financing prepared by the 
author for the National Bureau of Economic Research. In these studies, the 
author reports and analyzes data covering the bonded indebtedness of corporations 
in the railroad, public utility, and industrial fields during the period 1900 to 1944. 
The first study, The Volume of Corporate Bond Financing since 1900 (Princeton 
University Press, 1953), was devoted to the broad trends appearing in the aggregate 
data. The third volume, Statistical Measures of Corporate Bond Characteristics and 
Experience will provide additional tabulations, a description of estimating proce- 
dures, and suggestions for using the data. 

The present book compares investor experience with various prospective measures 
of quality. The latter include agency ratings, legal status, market rating (difference 
between promised yield and yield on best outstanding bonds with same maturity), 
times-charges-earned ratio (ratio of income before fixed charges to charges), mar- 
gin of safety (ratio of net income to gross income), lien position, size of issue, and 
asset size of obligor. Measures of investor experience include default rates (propor- 
tion of offerings that went into default at any time between offering and extinguish- 
ment), realized yields, and loss rates (differences between realized yield and promised 
yield). 

Among the findings, these seem the most interesting: 

(1) The aggregate net loss rate was zero. About 10 per cent of the $71.5 billion 
par amount of corporate bonds analyzed was paid in full at maturity, about 20 per 
cent went into default. The rest were either extinguished by call or remained out- 
standing at the end of the period studied. Realized yields necessarily equalled 
promised yields for those extinguished by payment in full at maturity, while capital 
losses incurred on the issues in default were just offset by capital gains either re- 
alized or accrued on those outstanding at the end of the period. This result is per- 
haps surprising, since the data were heavily weighted by the high default rates of 
the 1930’s. Were the study extended from 1944 to the present, the record of cor- 
porate capacity to service long-term debt would be even more favorable. 

(2) Quality and quantity varied inversely. The proportion of offerings that later 
defaulted was high in years of heavy financing. This finding seems to imply that 
there were increases in borrower and investor confidence during an expansion in 
investment activity which were not substantiated by later economic developments. 
But does this really say more than that, during the period studied, an upswing was 
followed by a downswing? Nor should one infer from this finding that the quality of 
the high volume of post-war financing is similarly suspect, The structure of the 
economy and of economic policy have undergone fundamental changes. 

(3) The growth of large corporate borrowers and institutional lenders did not re- 
sult in an increase in the relative importance of large issues. Extreme inequality in 
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the size of issues was apparent. In 1944, for example, issues outstanding in amounts 
of less than $5 million accounted for two-thirds of the total number of issues but 
only one-tenth of the par amount. Moreover, during the period being studied, the 
average size of oustanding issues nearly quadrupled. However, a given proportion of 
the total number of issues, distributed according to size, accounted for about the 
same proportion of the total par amount in 1944 as in 1900. An increase in the share 
of larger issues during the earlier part of the period was roughly canceled by a later 
decline in their share. 

(4) Prospective quality measures provided reliable rankings with respect to the 
risk of default. There was an inverse relationship between retrospective quality as 
measured by default rates and each of the major prospective measures of quality, 
agency ratings, legal status, etc. Although different weights were assigned, the same 
criteria were used by each of the rating systems. The various systems were more 
efficient, however, in ranking issues within an industry than in assessing risks accord- 
ing to industry groups, with the rails as the chief source of difficulty. With respect 
to cyclical swings in investor confidence, the investment agencies proved to be more 
susceptible than the market. To some, this is troublesome since the agency ratings 
are supposed to measure “intrinsic quality.” However, such a measure would be of 
doubtful practical value, since the riskiness attached to the ownership of an asset is 
not independent of its economic environment. Nor should supervisory authorities 
charged with responsibility for institutional liquidity and solvency be expected to 
ignore cyclical swings in the ability of a given portfolio to meet these standards. The 
burden of ameliorating the cycle surely lies elsewhere. 

(5) The market undervalued low quality issues. Higher default losses of low-grade 
issues were more than offset by the higher promised yields exacted by investors. 
This conclusion, however, applies only to the broad aggregates over long periods of 
time. Thus small investors, who held only high grade issues probably fared better. 
That higher returns were actually obtained on low-grade issues reflected (a) the 
premium for risk-bearing required by small investors and (b) the regulation of port- 
folios large enough for the elimination of the need for a risk premium through di- 
versification. 

This book and its two companion volumes are must reading for practitioners of 
portfolio management as well as students of the business cycle. 


The Economics of Discrimination. Gary Becker. University of Chicago Press, Chicago, 
Tilinois, 1957. Pp. 137. $3.50. 


ARMEN A. ALCHIAN, University of California, Los Angeles 


T° say that this is the best book on the manifestations of discrimination would be 
no great compliment, since so little has been done elsewhere. But praise is due 
Becker for having written so superb an analysis in a relatively unexplored but im- 
portant field. The reader will gain a richer understanding of discrimination. Futher- 
more many preconceptions and errors will be removed—to judge by this reviewer’s 
experience. The theoretical analysis is beautifully, if concisely, presented and many 
of the empirical implications are tested and measured. In other instances, illustra- 
tions are given of how to test the implications with observable quantitative data. 

Were this review being written for a professional economics journal much would 
be said about its theoretical model. Emphasis would be placed on the fact that 
Becker has used the classical economic postulate that people try to maximize their 
utility—and not their wealth. Thus he treats wealth as one component or variable 
in a person’s decision or utility function. Other components are of a non-pecuniary 
nature, such as the working environment, and the kinds of people with whom a 
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person associates in earning his wealth. This last attribute—the personal features of 
one’s employees or employers—is such that some individuals are willing to pay some- 
thing in the form of a reduced income te be associated with some people instead of 
with others. When discrimination occurs, the discriminator must in fact either pay 
or forfeit income. This simple, but powerful, way of looking at the matter gets at 
the essence of prejudice and discrimination. Thus an employer who prefers to have 
blonde employees rather than equally productive brunettes will act as if employing 
a brunette costs him more than blondes. Only at lower wages for brunettes than for 
blondes would he hire brunettes. This price difference becomes a basis for measuring 
discrimination. 

Becker compares this approach with others commonly used and shows wherein 
the others are inconsistent or inadequate. He then shows that this application of 
basic economic ideas is fruitful. He shows that tastes for discrimination are an im- 
portant part of the theory explaining actual discrimination, but the analysis is not 
simple. Tastes for discrimination can result in market price discrimination as well as 
in market segregation, two frequently confused concepts. Becker is careful to keep 
segregation distinct from discrimination phenomena, even though both are implied 
by a taste for discrimination. For example, groupings of Negroes and whites con- 
stitute segregation, but that does not mean that the rents for similar housing must 
be different. The latter implies discrimination, whereas the former does not. Becker 
is careful to note that no emphasis should be placed on the distinction between dis- 
crimination in “favor of” as distinct from that “against.” In other words a theory 
based on “hatred” of one group is not easily distinguished empirically from one based 
on love of the other group. But at the same time, conclusions about normative issues 
may depend on whether hatred or love is assumed to motivate decisions. 

But this review is written primarily for statisticians. What is there in it that would 
interest them? Aside from its general excellence as an analysis of “discrimination” 
its interest to statisticians as professionals is as an example of the application of 
available observable quantitative data to what are frequently regarded as non-quan- 
titative subjective phenomena. The book serves as an example of the fact that sta- 
tistical analysis can be applied to measuring sociological phenomena, in a sense more 
profound than merely counting how many people do or do not say they have 
certain tastes or preferences. A measure of the effects of these tastes can be made, as 
Becker shows. 

For example, the effects on the income to labor and capital consequent to discrimi- 
nation among Negroes and whites is revealed. He shows that under conditions that 
appear to be satisfied white capitalists suffer from discrimination against Negroes 
while white laborers gain. He obtains different implications about the effects on 
incomes of two minority groups, Indians and Negroes, consequent to differences in 
the degree of segregation. The wide range of phenomena covered can be suggested 
by listing some of those explicitly discussed by Becker: discrimination by minorities, 
discrimination by single employers and by the group of employers, differences in 
discrimination between competitive and monopolistic employers, discrimination by 
employees and by unions, discrimination by consumers and by government. The 
effects of these tastes for discrimination are tested with market phenomenon com- 
paring North and South, different industries, retailing versus manufacturing, different 
professions, and in farming versus urban occupations. In each case measurements are 
made and hypotheses tested. 

If Becker had had an eye on royalties, more elaboration of obvious intermediate 
steps would have produced easier reader acceptance. Nevertheless this book is a 
major contribution to our understanding of discrimination and to its measurement. 
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Finite Queuing Tables. L. G. Peck and R. N. Hazelwood. New York: John Wiley & Sons, 
Inc., 1958, for the Operations Research Society of America. Pp. xvi, 210. $8.50. 


Dennis V. Linpiey, Cambridge University 


HE queueing situation considered in the tables is perhaps best described in the 
language appropriate when a number, N, of similar machines in a factory is being 
serviced by M repairmen. It is assumed that the probability that a working machine 
will break down in any small interval of time is proportional only to the length of 
the small interval and does not depend on any other circumstances, such as the 
length of time that it has been working or the state of the other machines or the 
repairmen. The constant of proportionality is the same for all machines. In other 
words, the distribution of trouble-free service is negative exponential with the same 
mean, U, say, for all machines. It is similarly assumed that the distribution of repair 
time (it takes one man to service a machine) is negative exponential with mean, T, 
say, the same for all machines and repairmen, independently of other circumstances 
such as the number of broken machines. When amachine breaks down it is im- 
mediately serviced, if a repairman is available, but otherwise waits until one is avail- 
able. This latter period is referred to as its waiting titie. The model might also be 
expressed in terms of customers at a shop, subscribers making telephone calls, etc. 
The tables provide, for given values of N, M and X=T/(T+U), the value of D, 
the probability that a machine, when it breaks down, will have to wait before it is 
attended to by a repairman; and F, an efficiency factor, equal to one minus the 
long-run proportion of time spent broken, waiting for repair. The choice of X as 
argument is sensible because it varies between 0 and 1. The choice of D and F as 
the quantities tabulated is excellent because it is possible by carrying out a few 
simple calculations, to determine the values of most parameters of interest in the 
system. For example, the average number of repairmen occupied at any time equals 
FNX. The range of values of N is 4(1)26(2)70(5)170(10)250 and of X 0.001(.001) 
0.026(.002)0.070(.005)0.170(.01)0.34(.02)0.60(.05)0.95. The values of M chosen are 
generally such that neither D nor F is too near 0 or 1, but there are some unexplained 
curiosities. For example, with N =170 and X =0.58 the values of M are 110(3)107(1) 
88 but for X =0.60 they are 113(2)109(1)91. The gap of 3 in one case and 2 in another 
is puzzling. D and F are given to three decimal places. No differences are provided 
and no mention is made of interpolation. The tables are not conveniently laid out for 
this purpose (though the layout is satisfactory in other respects). However, in most 
cases it seems possible to interpolate satisfactorily at least to two places of decimals. 
The tables have been calculated on Univac and produce by photo-offset from the 
product of the Univac high-speed printer. The result is some poor printing, not bad 
enough to cause serious doubt as to the value of any digit but enough to make one 
look twice. It is not clear why calculating machines should have such bad founts, 
but it seems general, and Univac is no exception. The old-style fount with digits of 
different heights is so superior to the uniformly fat digits used here that it is surpris- 
ing the latter should kave been used. 

The most unsatisfactory feature of the book is its poor introduction. It is badly 
printed. It contains numerous errors: for example line one on p. ix should read 
£=\T and the T should be omitted from the next line (a most confusing error); 
in (5) n’ should be n!; the first part of (11) is not valid when n= M. But its principal 
defect is that it does not contain the formulas by which D and F were calculated. 
For some reason it begins with a rather complete account of the case N = © (which 
is not tabulated, the formulas being simple) and then proceeds to a much more 
superficial account of the case of finite N. Although formulas for D and F are given 
in the former case they are omitted in the latter. The first impression is that the 
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formulas for N= « have been used in the calculations, but fortunately they have 
not. The correct formulas which yield the values given (at least in a sample of two!) 
are 


D= & PAN —2) / PN 2) 


Naez 


F=1i-— Y¥ P,(s — M)/N. 
NezezM 

(P, is the probability that s machines are not working, i.e. either being repaired or 

waiting.) Neither is so easy to derive that it should be omitted. It is not clear either 

that the values of D and F are correct to the three places given. We are only told 

how the P, were calculated: the description concludes with the words “the values 

were then rounded to three significant figures.” Presumably the values referred to 

are those of P,, in which case D and F are almost certainly in error in the last place 

on some occasions. A little differencing confirms this suggestion. 

How valuable these tables are likley to be in practice depends very much on how - 
important the assumptions of negative exponential distributions for machine time 
and repair time are. If a manufacturer has a situation in which the distributions ob- 
tain then undoubtably the tables will be valuable to him in answering such questions 
as how many repairmen to employ, how beneficial preventive maintenance could 
be, ete. Furthermore the tables are well designed for such purposes. But suppose 
the distributions depart from the negative exponential form, can the tables still be 
used without serious error? To use a term which is fashionable today in statistics, 
how “robust” are the formulae to changes in the underlying distributions? If it could 
happen that minor changes were enough to alter F from 90% to 70% then the manu- 
facturer may well be lead astray seriously. So far as I am aware little is known about 
the robustness. Quite a bit is known in the case M=1, N= and here the assump- 
tion of constant service-time is known to give very different answers from the nega- 
tive exponential distribution. It would be interesting, if Univac could be spared, to 
calculate the tables for constant service time. Until some more results are available 
the use of the tables may be misleading, or, of course, it may not. Student’s t-table is 
robust and it is to be hoped that the present one is too. 

It is curious that no mention is made of Erlang, who was the first to solve the 
finite queueing problem. Erlang also wrote on the construction of mathematical 
tables and it would have been most interesting to see what he would have tabulated 
had he had the services of Univac available. 


The National Economic Accounts of the United States. Hearings Before the Subcommittee 
on Economic Statistics. Joint Economic Committee, Congress of the United States, 85th 
Congress, Ist Session. Washington, D. C.: United States Government Printing Office, 1957. 
Pp. iii, 302. $0.75. 


Artuur S. GotpBerGer, Stanford University 


N 1934, the document presenting the first official estimates of national income in 

the United States noted that “there is general paucity of data on entrepreneurial 
incomes and the estimates relating to this income type are the most subject to doubt.” 
During the intervening quarter-century, vast improvements have come in the con- 
ceptual and empirical bases of the national income statistics. The continuing de- 
bates on conceptual issues have sharpened the understanding of the national income 
accounts, and considerable effort, imagination, and ingenuity have been exerted to 
obtain consistent aad timely estimates of activity in a complex economy. Now, an 
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expert committee has reviewed the present official national income statistics and re- 
ports, “for the immediate future, the most important single step that could be 
taken to improve the accuracy of the national accounts would be to improve the 
data for nonfarm sole proprietorships and partnerships [i.e., for entrepreneurs]. . . . 
Although estimates of these items are currently included in the various accounts, 
they can be regarded as little more than informed guesses for the small-business 
sector.” 

This somewhat unhappy situation can hardly be considered the fault of the Na- 
tional Income Division of the Commerce Department, which produces the official 
national income accounts. If there remains dissatisfaction with this remarkable body 
of data, it is in good part attributable to the relative scarcity of raw statistical in- 
formation. As George Jaszi, Chief of the National Income Division, remarks, “there 
has in the last decade been no significant addition to the quantity or quality of the 
primary statistical data that are the raw materials of national income estimates.” 

It is this state of affairs, together with the recent development of alternative 
macro-economic accounting schemes that led to the present volume. In November 
1956, at the request of the Office of Statistical Standards, the National Bureau of 
Economic Research set up a nine-man National Accounts Review Committee. 

The Committee undertook a review of the existing national accounting schemes 
with a view to recommending improvements and additions. Chaired by Raymond 
Goldsmith, the committee members drew on their own experience and on conferences 
with producers and users of the accounts to prepare their report which comprises 
the latter two hundred pages of this volume. The first one hundred pages contain 
the testimony of the committee members and several others at a two-day congres- 
sional hearing held in October 1957. 

The Committee’s report will be of interest to all serious users of the United States 
national economic accounts. The existing accounts are surveyed, at points with re- 
gard to conceptual matters, but more emphatically with regard to the underlying 
source material. The gaps in this material are highlighted; they are numerous and 
wide. Besides unincorporsted business, areas of particular weakness include con- 
struction, and, remarkably, government. The recommendations are detailed and 
range from speedier tabulations of tax returns to more extensive use of field surveys. 

The implementation of the Committee’s recommendations is clearly dependent 
upon budgetary appropriations. Still, the proposals in this volume give the economist 
a view of the future path of official national economic accounting. He learns that 
estiinates of quarterly constant dollar gross national product and of replacement 
cost depreciation may be forthcoming. Further, the development of an integrated 
set of national economic accounts, encompassing flow-of-funds, input-output, and 
balance sheet material as well as the traditional income and product accounts ap- 
pears as at least a long-range goal. 

In the absence of numerical sampling errors, the qualitative judgment of the ex- 
perts as to the reliability of the components of the national accounts is in order. For 
the statistician, this discussion may be of special interest. 

Also present are refreshing comments on the official household income data, which 
are drawn on for the national income totals as well as for income size distributions. 
Personal income tax returns are the basic source for this data. In 1949, an audit of 
these returns found that net profits of nonfarm entrepreneurs were understated by 
an average of almost 20%, in some industries by over 50%. From 1950 to 1953, a 
boom period, the number of tax returns reporting high incomes declined. Finally, 
there has been an increasing resort to what the Committee graciously refers to as 
“changes in the methods of income disbursement.” It notes, “the reliance on pensions, 
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deferred compensation and stock options in lieu of cash wages and salaries, the con- 
version of ordinary incomes into capital gains, the growth in importance of business 
expense accounts that cover items of personal consumption, and the use of personal 
trusts to split incomes among members of the family is likely to have had an impor- 
tant impact on the relative size distribution of income.” Personal income tax returns 
do no justice to these changes. The Committee recommends that private research 
organizations be encouraged to study this area; further, that the tabulation of the 
1948-50 audits be completed, and that such studies be repeated every fifth year. 

References to the gaps in raw data, especially benchmarks, abound in the report. 
The allocation of automobile purchases between the business and household sectors 
is based upon traffic surveys taken in the 1930’s. Why? Ingenuity and information 
are not perfect substitutes; information is not a free good. The budget of the National 
Income Division has been cut in the face of increased demands for its product. In 
1957, it operated on a budget of $242,000, with a staff of 35, including secretaries. 

There may be a historical parallel for all this. If so, its implications are rather 
disturbing. Some time ago, a group of civil servants was instructed, “begone to your 
work; you shall get no straw, but you must deliver your quantity of bricks.” Reliable 
reports indicate that a series of plagues was the consequence. 


Colonial Agriculture Statistics: The Organisation of Fieid Work. Colonial Research Pub- 
lication No. 22. K. EZ. Hunt. London, HMSO (Mimeograph) 1957. Pp. viii, 123. 14 
shillings. 


Puiu M. Ravp, University of Minnesota 


= PURPOSE and organization, this is a handbook for administrators, statisticians, 
and field supervisors concerned with the collection of agricultural statistics in 
underdeveloped areas. The first portion outlines the use of reconnaissance and case 
studies to determine the overall agricultural picture where little information exists. 
This is followed by an analysis of sampling techniques appropriate to remote areas 
and primitive agricultural practices. In this regard, the term colonial in the title 
may be misleading. The bulk of the material refers to the generalized problems of 
data collection under primitive conditions in tropic and subtropic agricultural areas. 

The agricultural economist or statistician will find some intriguing viewpoints in 
this small volume with respect to the problems of achieving maximum efficiency in 
the use of scarce personnel. It is interesting to note, for example, that the sampling 
techniques from advanced countries have some of their most direct application in 
the least advanced countries, and for similar reasons: Scarcity of available skilled 
labor. In the underdeveloped country the scarcity is absolute; in the highly developed 
country it is relative as reflected in high labor cost. There seems to be a consequent 
wide possibility for the transferral of some sophisticated data collecting techniques 
directly to the problem situations in primitive agricultural settings. The collection 
of total enumeration data by census-type procedures is apparently becoming a 
luxury, reserved only for the moderately developed countries where relative labor 
costs still have not risen to a point precluding the enumerative approach. 

The reader is impressed with the level of statistical detail required for good work 
in tropic and subtropic agricultural areas. In the temperate latitudes we are ac- 
customed to data based on rectangular fields devoted to single crops, clearly defin- 
able in an areal sense, and with definite seed times and harvest times. The collection 
of agricultural data in tropic and subtropie areas is much more complex. Through- 
out many regions of the globe, the village is the minimum unit of decision-making 
in agriculture, and must serve as the minimum unit of enumeration or sampling in 
agricultural data collection. This requires the development of a variety of techniques 
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for standardizing the concept of a “village” and for treating it statistically in areas 
where villages migrate from place to place, and disappear or reform with bewilder- 
ing rapidity. 

The practices of double-cropping or of partial-harvesting, together with the horti- 
cultural nature of much agricultural production in primitive economies makes any 
simple count of acres under given crops out of the question. Add to this the complica- 
tion of heavy dependence on bush crops or tree crops, which may form an additional 
part of a complex of inter-planted, double-cropped, and partial-harvested forms of 
land use, and one begins to appreciate the statistical headaches faced in underde- 
veloped agricultural economies. 

The collection of agricultural data in underdeveloped areas is more difficult and 
complex than in developed economies for another reason: the impossibility of using 
mail questionnaires or self-reporting techniques. This handicap leads to the develop- 
ment of a variety of intriguing statistical innovations of a spot-check or sub-sampling 
variety. This publication includes usable descriptions of several of these methods, in- 
cluding line and transect methods of sampling irregularly shaped areas characterized 
by noncontiguous patches or “hills” of cropped land. 

The concluding forty pages of the report are devoted to a discussion of administra- 
tive arrangements, and training of staff, together with several appendixes on cata- 
loguing data, preparation of forms, schedules and questionnaires, measuring, weigh- 
ing, mapping, surveying, and related details of statistical administration. 

More attention could have been devoted to the application of aerial graphic tech- 
niques. The use of aerial photogrammetry in the preparation of base maps is now 
widespread, but apparently much work remains to be done in adapting aerial photo- 
graphic techniques to the problem of agricultural data collection. Reference is made 
in this report to the common American practice of “driving the roads” from year to 
year, with simple mileage indicators calibrated to permit calculation of the number 
of miles of field boundaries fronting on a given route. This has proved a quick and 
reliable estimating device for crop acres. A variant form of this technique, involving 
the repeated flying of a fixed route, may have application in areas where egriculture 
is primitive, roads deficient, and fields randomly bounded and non-contiguous. 

There is an almost total lack of any discussion of the problems of data collection 
that involve land tenure. Concepts of ownership, lease, sale and mortgaging are so 
variously developed in the underdeveloped world that tenure statistics are complex 
and difficult to assemble. Admitting the difficulty of the problem, some discussion 
of this aspect of agricultural statistics would have strengthened this report. 

A great deal of practical information is packed into this modest publication. It 
should prove of invaluable aid to agricultural officials, economists and statisticians 
in a variety of countries not confined to those infinitely complex agricultural areas 
that we lightly classify as underdeveloped. 


National Atlas of the United States. United States Geclogical Survey. Loose-leaf maps, 
Sheet size 16 by 22 inches; various scales, principally 1:10,000,000 and 1:30,000,000. 
various dates, various prices, principally 10 and 15 cents. 


Harowp M. Mayer, University of Chicago 


ANY geographers, economists, statisticians, and others have long felt the need 
for a series of maps, on uniform scale to permit comparisons and study of 
covariation, of significant physical, demographic, cultural, and economic distribu- 
tions within the United States. A Committee on a National Atlas of the United 
States has been at work for several years, representing various Federal agencies, 
under the auspices of the National Academy of Sciences-National Research Council, 
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to develop criteria and machinery for the production of a National Atlas, comparable 
in scope to those produced by a number of other nations. The result was a series of 
recommendations relative to scope and format, including sheet size and scales, for 
the production of maps by governmental agencies and other organizations, in which 
data relative to the entire United States could be presented for comparative study. 
It was recommended that, unlike the national atlases of most other countries, the 
sheets be prepared in loose-leaf form, so that users could purchase only those sheets 
that they desired, and so that the atlas could be kept current as revisions of individual 
sheets are issued. 

The first 56 sheets to be issued in the standard format are of uniformly high 
quality and utility. They cover a variety of topics, and give promise that the atlas 
will be an extremely useful device for all who are interested in the distributive aspects 
of economic and social, as well as physical phenomena, in the United States. 

The sheets so far issued include maps of the urban and rural population of the 
United States based upon the 1950 census, a map of types of farming, a series of 12 
climatic maps issued by the Weather Bureau showing, by means of isopleths, the 
standard deviation of monthly average temperature throughout the United States, 
and a series of agricultural distribution maps produced by the Bureau of the Census 
showing such things as: number and distribution of farms (dot map) and increase or 
decrease in number of farms, land in farms, total cropland, harvested cropland, 
pastureland, farm woodland, irrigated land, conservation farming practices, types 
of commercial farms, and the distribution of certain crops. 

The variety of maps which lend themselves to production in the standard format 
is, of course, nearly infinite, but each agency is producing those which it believes 
to be most significant and useful. 

Sheet sizes are mainly 16 to 22 inches, but in the case of subjects or patterns too 
complex to be represented on such sheets, larger sheets readily foldable to that size 
are printed. Some subjects are represented by several maps on the same standard- 
size sheet, and in some instances there is brief textual explanation of the significant 
patterns on the reverse side of the sheet. In instances where the topic is presented 
on a single map per sheet, the standard scale is 1:10,000,000; some sheets have 
multiple maps on smaller scales. Generous use is made of multiple colors where they 
assist in presenting the material. Equal-area projections are used, either Albers or 
Lambert conformal, and the differences between them at the standard scales are not 
detectable, thereby permitting direct superposition or comparison of any combina- 
tion of distributions which are represented at the standard scales. 

Among the useful maps in the atlas are those representing the cartographic and 
air photo coverage of the United States, including an evaluation of the utility and 
reliability of such coverage. 

Most of the individual sheets are for sale by the Superintendent of Documents at 
prices of 10 and 15 cents per sheet. Lists of sheets and where to order are supplied 
by the Map Information Office of the U. 8. Geological Survey, Washington. 

If the sheets continue to be issued with the variety of topics, and with the sta- 
tistical and cartographic standards of those now available, the National Atlas should 
gradually become an invaiuable reference and study tool for all who are interested 
in the physical, cultural, social, economic aspects of the nation. Comparative studies 
correlating areally many aspects of national development should be greatly stimu- 
lated by the availability in uniform scale and format of maps showing many of the 
more significant distributions. j 
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Vol. 14, No. 4 CONTENTS December 1958 
Fitting the Logistic by Maximum Likelihood ................+++- J. L. Hodges, Jr. 


Multinomially Grouped Response Times for the Quantal Response Bioassay .... 
inne Ridin phd acdants <4 Vann de 0b iendeeeee Robert F. White and Joseph G. Graca 


Interaction of Genotype and Environment in Continuous Variation. II. Analysis 
achekessanech dub ad babid Cuveeksgs vanes’ R. Morley Jones and Kenneth Mather 


Experiments With Two Treatments Per Experimental Unit in the Presence of an 
Deinideds Caste iis Cis so kc cnvvccddenocchasbedae ces cbUSR here C. P. Cox 


Sur Une Solution “A Priori” de La Methode “A Posteriori” de Haldane ...... 
bse secctbdcens sbeanded codeabdasdeoseneccnsdebnqesssecncre Jéréme Lejeune 


Joint Analysis of Experiments in Complete Randomised Blocks With Some Common 
Treatments ..........- Frederico Pimentel Gomes and Rubens Foot Guimaraes 


The After-History of Pulmonary Tuberculosis: A Stochastic Model ..David W. Alling 
Application of Quantal Response Theory to the Cross-Comparison of Taste Stimuli 


TN OPE Tee ee eee ees Pet re Le ee N. T. Gridgeman 
Queries and Notes 

Note on Hemacytometer Counts ............seceeeeceecceeees H. C. Hamaker 

Chi-Squares of Bartlett, Mood, and Lancaster in a 2° Contingency Table .... 

odcdcccchacecovegge begtghies «Oss -hmeenech -SpeisiiCeeea< George W. Snedecor 





Biometrics is published quarterly. Its objects are to describe and exemplify the use of 
mathematical and statistical methods in biological and related sciences in a form 
assimilable by experimenters. The annual non-member subscription rate is $7.00. In- 
quiries, orders for back issues, and non-member subscriptions should be addressed to: 


BIOMETRICS 


Department of Statistics 
Virginia Polytechnic Institute 
Blacksburg, Virginia 
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STANFORD RESEARC 


WALT INS 
Was Responsible Professional Opportunities far: CSR 
ONSe77 
e@ Applied Mathematicians ae antl 


ise 


STANFORD RESEARCH INSTITUTE 
MENLO PARK, CALIFORNIA 


®eeeeeeoeoeeeeeeeeeeeeeeeeeeeeeeeeeeercreaee eevee eeeeeeeene 
















STATISTICIAN 


Detroit Research Laboratories has opening in statistical 
planning group. Requires person with MS or PhD degree in 
statistics and two to five years’ industrial experience in en- 
gineering, chemical or physical applications. Principal duties 
involve the planning and analysis of experimental work of a 
diverse nature encountered in our automotive and chemical 
research laboratories. The data are often characterized by 
high variability and high cost. A digital computer is avail- 
able to the group. For more particulars write to: 
PERSONNEL MANAGER 


ETHYL CORPORATION 
RESEARCH LABORATORIES 


1600 W. 3} MILE ROAD 
FERNDALE 20, MICHIGAN 















Please mention the Journal of the Amenican Sratisticat Association in writing advertisers 

















The Annals of Mathematical Statistics 


THE OFFICIAL JOURNAL OF THE INSTITUTE OF 
MATHEMATICAL STATISTICS 


— 
—* 





Vol. 29, No. 4—December 1958 


erman 
A High perenne Sample > Pees rer fr A. P. 


ye ey ~ fA - age ty ee ea 
ties Shee “her 


nm 

mdent Variables Olive 

irkov Chaim ..cccs CO. J. Burke and M. Rosenblatt 
of Fit Criteria for m-th Order 


—y an 
Efficiency Problems in Polynomial Estimation ....... ceewececees- Poul G. Hoel 
On ior Sei Canonical Correlation aT 


Significance Level and Power 
Step-Down Procedure ~~ Multivariate Anal 
The Limiting Distribution of the Serial 


Case 
A Limit Theorem for the Periodogram ...Salomon Bochner and fw ey Kawata 
Proof of Shannon’s Transmission pire for Pig Indecomposable Chan- 
nels David Blackwell, Leo Breiman, and A. J, Thomasian 
On the Limiting Power Function of the Frequency Chi-Square Test 


Some Exact Results for the Finite Dam 

Minimax Estimation for Linear Regressions i 

Covariances of Least-Squares Estimates When Residuals are Correlated 
M. M. Siddiqui 

On a Probability Problem in the “Theory of Counters ..... héeees ceane L. Takdcs 


Notes: 

Distribution of Linear Contrasts of Order Statistics . Jacques St-Pierre 
Admissible Qaameet Tests for the a of a Rectangular Distribution J. W. Pratt 
A Method for Selecting the Size of the Initial Sample in Stein’ » te Pro- 


News and Notices 
Report of the Cambridge, Massachusetts Meeting 
Report of Officers of the Institute 


—_ 
— 





Address orders for subscriptions and back numbers to Professor George E. 
Nicholson, Jr., Secretary, Institute of Mathematical Statistics, Department of 
Statistics, University of North Carolina, Chapel Hill, North Carolina. 
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STATISTICIANS for 
ATOMIC ENERGY 


STATISTICIAN , . . Bachelor’s degree in any physical 
science with a minor in statistics or special courses in 
statistics. Minimum of two years in industrial statistics 
(production, engineering, or research). 


Principal Duties; Statistical analysis and presentation of 


STATISTICAL TECHNOLOGIST . . . Bachelor’s 
degree in mathematics or any physical science with 
M.S. in statistics. Must be well versed in techniques of 
experimental design as applied to production, engineer- 
ing, and research projects. Ability to organize, com- 
pose, and present technical reports. Minimum of three 
years in quality control or industrial research. 


Position consists primarily of consulting on statistics and 


experimental design with technical personnel involved in 
engineering, production and process development. 


-@ MOVING EXPENSES PAID 


Address resume of education and experience 
to employment supervisor, Dept. J-111. 


Mette Sead Conary 


OF OHIO 


FEED MATERIALS PRODUCTION CENTER 
P. O. Box 158, Mt. Healthy Station, Cincinnati 31, Ohio 
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As a result of our continuing expansion program, we, a large, top-rated 
manufacturer of electronic and Chctrohicheniasl devices, are now offering 
this highly-desirable position. We should like to hear from you if you are 
a mathematical statistician with an advanced degree and are familiar with 
non-parametric and order statistics, Monte Carlo ures and Markoff 

. Experience in implementing these is desirable, as is some 
familiarity with computers, aay the IBM-650. Your ability to recog- 
nize and formulate problems is of paramount importance. You will be afforded 


the of making unique contributions in the fields of operations 
Seaeh relickiity aaiietion and prediction, and statistical prediction 


Our plant is located in a moderate-size, ultra-progressive midwestern city, 
adjacent to choice, an he ceed cae highly-rated = 
ls. Assistance program for post uate at universi' 
if desired. Your inquiry held in strict confidence; your present employer will 
not be contacted without your au*horization. Mail brief resume at once to 
Box 400, American Statistical Association, 1757 K St., N.W., Washington 6, 
D.C. 








New Revised Edition 
KENDALL, M. & STUART, A. 


ADVANCED THEORY OF STATISTICS 
Volume | 


Entirely rewritten—three new chapters—new details, ex- 
amples, exercises. 


The current edition of volume 2 ($9.75), which will not be 
revised for several years, can be used readily in conjunction 
with this new edition of volume |. 


433 pp. 6th ed. 1958 $13.50 


STECHERT-HAFNER, Inc. 


FOUNDED IN NEW YORK IN 1872 
The World's Leading International Booksellers 
31 EAST 10TH STREET, NEW YORK 3, NEW YORK 
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AN INVITATION 
TO JOIN ORO 


Pioneer In Operations Research 


Operations Research is a young science, earning 
recognition rapidiy as a significant aid to decision- 
making. It employs the services of mathematicians, 
physicists, economists, engineers, political scientists, 
psychologists, and others working on teams to syn- 
thesize all phases of a problem. 


At ORO, a civilian and non-governmental organ- 
ization, you will become one of a team assigned to 
vital military problems in the area of tactics, strategy, 
logistics, weapons systems analysis and communi- 
cations. 


No other Operations Research organization has 
the broad experience of ORO. Founded in 1948 by 
Dr. Ellis A. Johnson, pioneer of U. S. 

ORO’s research findings have influenced decision- 
making on the highest military levels. 


ORO’s professional atmosphere encourages those 
with initiative and imagination to broaden their 
scientific capabilities. 

ORO starting salaries are competitive with those 
of industry and other private research organizations. 
Promotions are based solely on merit. The “‘fringe”’ 
benefits offered are ahead of those given by many 
companies. 

The cultural and historical features which attract 
visitors to Washington, D. C. are but a short drive 
from the pleasant Bethesda suburb in which ORO is 
located. Attractive homes and apartments are within 
walking distance and readily available in all price 
ranges. Schools are excellent. 


For further information write: 
Renee ywe. id 


OPERATIONS RESEARCH OFFICE 


The Johns Hopkins University 


693656 ARLINGTON ROAD 
BETHESDA 14, MARYLAND 
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BIOMETRIKA 


VOL. 45, ai cpl CONTENTS December, 1958 
Pl rans, Roy” Boe. regoes 11Gb, 88 shed, bai"b70418). Seudiee tn the Te the history of 


LESLIE, P. H. and GOWER, J. C. The sensidiina of a stochastic model for two com- 
peting species 


TANNER, J. C. A problem in the combination of accident frequencies 
DARROCH, J. N. The multiple-recapture census. I. Estimation of a closed population 
BULMER, M. G. Confidence intervals for distance in the analysis of variance 


~~ D. J. The efficiencies of alternative estimators for an asymptotic regression 
equation 


PATTERSON, H. D. The use of autoregression in fitting an exponential curve 
HAIGHT, Frank A. Two queues in parallel 

SHENTON, L. R. Moment estimators and maximum likelihood 

SRIVASTAVA, A. B. L. Effect of non-normality on the power function of t-test 
pGansoe, E. 8. Note on Mr. Srivastava’s paper on the power function of Student’s 


BARTON, D. E. and CASLEY, D. J. A quick estimate of the regression coefficient 


ZINGER, A. and St-PIERRE, J. On the choice of the best amongst three normal 
populations with known variances 


DRONKERS, J. J. Approximate formulse for the statistical distributions of extreme 

HOOPER, J. W. The sampling variance of correlation coefficients under assumptions 
of fixed and mixed variates 

coumgen, H L. The mean deviation, with special reference to samples from a Pear- 

n Type III population 

MERRINGTON, paaties and PEARSON, E. 8. An approximation to the distribution 
of non-cen' 

FOSTER, F. G. Upper percentage points of the generalized beta distribution. III. 


MENDENHALL, William and HADE R. ‘ Estimation of parameters of mixed 
exponentially distributed failure time ions from censored life test data 


Sie William, A bibliography on life testing and related topics. 
Bieestone* ions by D. de 4 & F.N. : av R. COX, Edwin L. 
TELS & MG. DA CUTTMAN D. 'G. KABE, 
Roy We DAN Rita MAURICE A RAMASUBBAL BBAN, 8. N. ROY & R. GNAN- 
SIKAN, M. SANKARAN, B BD V.E 
Corrigenda: N. L. JOHNSON, A. R. KAMAT, J. SAW 


Reviews Other Books Received 


The etppmtotion, AE ie in ea is 54s. J mat OS A volume  wafe Blom post- 
Biometrika 


). pee te. & 
ria College. Lon wcil, roarn Biometr Sind a must ica, Deparineat oe on a taisticn, Unt 
a London agency. 
Issued by THE BIOMETRIKA OFFICE 
University College, London 
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ENGINEERS - SCIENTISTS 


The Amherst 
Laboratory 
















SVivania’s 
Center for Communications 
Research & Development 


Statisticians with advanced degrees and a lasting interest in some 
area of applied statistics will find in Sylvania’s Amherst Labora- 
tory both the opportunity and the environment to do challenging 
and creative work. Areas of interest include, BUT ARE NOT 
LIMITED TO, combinatorial design, stochastic processes, Monte 
Carlo methods, statistical decision theory, information theory and 
the design and contol of experiments. 


Scientists evaluating our employment opportunities will find that we 
amply reward creative ability, not only with a competitive indus- 
trial-range salary, but also with liberal fringe benefits. Sylvania 
provides both life and hospitalization insurance as a company- 
benefit, and encourages membership in professional societies and 
subscriptions to technical publications by paying one-half the cost 
of dues and subscription fees. 


The Amherst Laboratory is located in Western New York, near 
Buffalo, Niagara Falls and the Province of Ontario. 


Statisticians interested in the Amherst Laboratory’s expandi:g 
program of research and development are invited to write, in com- 
plete confidence, to the undersigned. A brief resume of education, 
experience and current interests, together with address and tele- 
phone number, will bring a prompt reply. 


Dr. R. L. San Soucie 
SYLVANIA ELECTRONIC SYSTEMS 
A Division of 


W SYLVANIA Y 


SYLVANIA ELECTRIC PRODUCTS INC. 
1188 Wehrle Drive, Amherst 21, New York 
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ESTADISTICA 


Journal of the Inter American Statistical Institute 
Vol. XVI, No. 59-60 June-SepTeMBER 1958 


Mortality in the State of Sergipe ...... ee ae -Thomas B. Jabine 
Dos Articulos sobre Muestreo: 
Gelre ta Toorla do. Mecstece do Fobincionss Finiten 


Guige to Presetneén como Bvidecin Lagu de len Bavudee de Encneie 
Muestras (traduccién) W. Edwards D 


Un, Betndie do les. Aspesies Eeontmices de lo Predueciin do Mats ncn 
*(traduecién) SETTCCUCsE Rect teccoccceceuseeeet esos see .--Lorand D, Schweng 


Niém Indices d a aR 
es _ dene agee Ese ..@len T. Barton y Charles E. Burkhead 


Procedimientos di Recopilacién de las Estadisticas epee” age ag en el Brasil 
- Of0 de Mesquita Lara 


Estadisticas Concernientes a la Infancia: Objeti ae, sorts Sugerencias 
a Considerar oe vary Gonshles hijo 


Reorganizacién de 2 nee ay Se Bae ie Costa Rica e 
Implantacién del “Sistema de Méritos” para Nana ° ileus Jiménez Castro 


Special Features. Legal Provisions. I WE oxtartornpmeat 5 game to Statistics. 
Institute Affairs. Statistical News. Publications. sine 


Published quarterly Annual subscription price $3.00 (U.S.) 


INTER AMERICAN STATISTICAL INSTITUTE 
Pan American Union, Washington 6, D.C. 

















JOURNAL OF BUSINESS 


School of Business, University of Chicago, Chicago 37, Illinois 





VotumMe XXXI OcroBer 1958 





International Comparisons of Productivity Trends 
On the Economic Management of Large Organizations: A La’ 


Religion and its Role in the World of Business 
On Being Fooled by Figures: The Case of Trading Stamps 


Price Variations on Autgmete Pik ~ mg Machines in Chicago, [lin Am 
Different Types of Uen F. 7. Jung 
Collective Bargaini over Spare: The Automobile Union’s Effort to 


Extend its Frontier of Control 
Royal E. Montgomery, Irwin M. Stelzer, Rosalind Roth 
: A Re-examination 


EIVED 
: Unrversrry SCHOOLs oF BUSINESS 
Index to Volume XXXI 





| ony Univer of OF BUSINESS is AS yen Be +e ly by yt gm ¢ Sal be 
versi year ai ould ress 

te eo Ze URNAL i OF BUSINESS, 1 Box 12, School hyo "he addreas of ~— = 

correspon: addressed to 
Schweiger, Editor, SOURNAL. OF mscripts in, duplicate School of of shits 
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Semelg WGRAW-HILL Zeeks 


MATHEMATICAL METHODS OF 
OPERATIONS RESEARCH 


By THOMAS SAATY, Mathematician in the Executive Office of the Secretary 
the Navy. 432 pages. $10.00 


This — level book presents the topics of optimization, probability and statistics, 
with illustrations for stimulating interest and giving insight into the mathematical 
structure involved. The principal mathematical tacthode are covered, and the previously 
scattered theoretical and illustrative literature is brought together. 


INTRODUCTION TO STATISTICAL ANALYSIS 


By WILFRED J. DIXON and FRANK J. MASSEY, Jr., University of Cali- 
fornia, Los Angeles. New Second Edition. 488 pages. 


An excellent revision of one of the most popular general and mathematical statistics 
texts. With no calculus prerequisite, it has been adopted in a variety of situations rang- 
ing from math departments and business administration, to biology and agriculture. 
It presents the basic concepts of statistics in a manner that shows the student the 
generality of the application of the statistical method. Both classical and modern tech- 
niques are presented with emphasis on their understanding and use. Much material 
has been brought up to date, with the latter chapters expanded to include several im- 
portant topics. 


ANALOG SIMULATION: Solution of Field Problems 


By WALTER J. KARPLUS, University of California, Los Angeles. McGraw- 
Hill Series in Information Processing & Computers. 434 pages. $10.00 


This text provides a comprehensive survey of analog techniques and systems for solving 
field problems, together with a concise presentation of the mathematical tools necessary 
for their optimum utilization. The approach to the subject is unified. The mathematical 
bridge, linking the characteristic problems of many diverse fields of engineering and 
physics, is developed early in the text. 


A PRIMER OF SOCIAL STATISTICS 


By SANFORD M. DORNBUSCH, Harvard University and CALVIN F. 
SCHMID, University of Washington. 264 pages. $5.25 


Social workers, social scientists, anyone interested in learning elementary statistics will 
welcome this up to date coverage of basic statistical concepts and techniques. Written 
with simplicity and clarity, this book is unique in that it presumes no mathematical 
knowledge beyond arithmetic. Appropriate mathematical information is introduced at 
the point where it becomes necessary. Mathematical concepts and formulas are presented 
simply and in such a manner as to make their significance and utility immediately 
apparent. Emphasis is placed on the nature of statistical reasoning, including appropri- 
ate explanations of the logic underlying various statistical concepts and techniques. 

are discussions on the derivation, application, and interpretation of various 
statistical tools. 








SEND FOR COPIES MeGraw-Hill Book Company, Inc. 


ON APPROVAL 330 West 42nd Street New York 36, N.Y. 
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THE JOURNAL OF FINANCE 


Published by THE AMERICAN FINANCE ASSOCIATION 
Volume XIIi, No. 4 (December, 1958) includes: 


ARTICLES 

Supplementary Security-Reserve Requirements Reconsidered ......Joseph Aschheim 

Israel's Survey of Consumer Finances . eeseeceeeesMordechai Kreinin 

Large Manufacturing Corporatio Su of Funds to the United States 
Government Becuriticn Market ppliers Wiliam J. Frazer, Jr. 

Liquidity Ratios and Recent British Monetary Experience ........David BE, Novack 

Some Implications of the Growth of Financial Intermediaries ...... . Donald Shelby 


COMMUNICATIONS 
A Comment on “The Federal Stecns Leen Bank: Gyetems and the Cunerd of 


Momberehin Se including $3.00 allocated to subscription in The Journal of Finance 
as aul ates maw be purchased for 61.26. Applications for mem 2 aae end 
ance, sociation and nd subscriptions to Journal of Deoue should be oo 
Hassett, te School of Business Admin 


a. Fy ork University me Trinke Pine Mea Yano New York. 


Cees eee oe ee oe ee of Finance should be addressed 

the Editor, ee rere, Senest of Dactncss, | Saieeaty <0 Etcaes, 87, 
Tina cote’ the Aaottlate Water, Cart AD School of Business and Public 
Administration, Washington University, St. Louis 3 Missouri, 











1958 MEMBERSHIP DIRECTORY 
of the 
American Statistical Association 


The 1958 renga 4 of the American Statistical Association is available for im- 
mediate a ores edition contains over 6,500 names, Information includes: member's 
mame and title; business affiliation and address; other business affiliations, if any; degrees, 
with year granted and institution; fields of specialization, including methodological techniques 
and felds of application; major types of statistical activities; and sectional interest. 

In — to the alphabetical listing, there is a 1 hical listing by city and state 
for the Gaited Gunton, cy anh peeinen tee. Ginetns anh Oopteteten, Sen thn con. ak n-ena. 
There is also a complete listing by membership in the five Sections of the Association: Bio- 
metrics Section, _ B and E istics Section, Section on Physical and Engineering 

Social Statistics Section and Section on the Training of Statisticians. 
The new Directory measures 8%” x 11” over-all, and contains 160 pages. 


Use the order form below te sure setters Ditonn SAA owr ciney i réntomee eommmpeies 
order. An additional charge of will be made, per copy, on orders received without remittance 














CRDER To: American Statistical Association, 1757 K St., N, W., Washington 6, D. C. 


copies of the 1958 Membership Directory, @ $4.50 per 
copy (remittance included herewith). Or bill me at the $5.00 rate and send invoice. 
(0 Payment Enclosed 0 Bill Us 
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(Poincine UNIVERSITY PRESS 


A Critique of the United States Income 
and Product Accounts 


Studies in Income & Wealth, Volume 22 
By the Conference on Research in Income & Wealth 


This first systematic and critical analysis of the Office 
of Business Economics Estimates examines many of the 
theoretical issues in national income accounting and dis- 
cusses the income and expenditure sides of the accounts. 
Papers appraise the suitability of the data for short-term 
analysis and the problem of obtaining real national prod- 
uct. The book closes with an analysis of the investment 
and saving components of the accounts and an evaluation 
of some of the basic data. Published for the National 
Bureau of Economic Research. 


616 pages. Tables. Charts. 1958. $11.50 





An Appraisal of the 1950 Census 
Income Data 


Studies in Income and Wealth, Volume 23 
By the Conference on Research in Income & Wealth 


This volume, from the 1956 Conference, deals with the 
nature, reliability, and the uses of the income data in- 
cluded in the 1950 census. In contrasts this data with 
income information from other sources—field surveys, and 
administrative records of government regulatory, fiscal, 
and social security agencies. Various papers deal with sub- 
stantive findings based on income data, survey the fron- 
tiers of size distribution research, and provide an historical 
review of income questions in census surveys. Published 
for the National Bureau of Economic Research. 


462 pages. Tables. Charts. $10.00 


Order from your bookstore, or 
PRINCETON UNIVERSITY PRESS 
Princeton, New Jersey 
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CARSARARAAAAARARARRARARRARARRRRAARRARRARAARARARARAARRT 


Comprehensive Medical 
Services under Voluntary 
Health Insurance 


By BENJAMIN J. DARSKY, M.A., NATHAN SINAI, Dr.P.H., and SOLO- 
ON J. AXELROD, M.D. Based on a close study of the Windsor (Ontario) 
Medical Services plan, this book offers an enlightening discussion of the 
issue of extending voluntary health insurance coverage to include physi- 
cians’ services outside the hospital. Careful examination of the insurance 
plan itself and its effect on the patients and their physicians gives solid 
evidence that such coverage is not only desirable, but thoroughly practical. 
$7.50 


Through your bookseller, or from 
ay HARVARD UNIVERSITY PRESS 
79 Garden Street, Cambridge 38, Massachusetts 
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STUDIES IN 
Linear and Non-Linear 


Programming 


Kenneth J. Arrow, Leonid Hurwicz, and Hirofumi Usawa 
This integrated collection of research papers considers many new 
aspects of programming and constrained maxima. Among the most 
important features included are: first extensive study made of infinite- 
dimensional programming problems; first proofs and a complete an- 
alysis of the gradient method for programming problems with concave 
and more general functions; simplified proofs of basic existence theo- 
rems for linear and non-linear programming. Stanford Mathematical 
Studies in the Social Sciences, I]. 

$7.50 


Stanford University Press, Stanford, California 
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FROM MACMILLAN 


INTRODUCTION TO PROBABILITY AND STATISTICS 

By B. W. Lindgren and G. W. McElrath, University of 
Minnesota 

This introduction to statistics for engineers presents many 
classical and modern statistical methods based on a pre- 
liminary, thorough treatment of the concept of proba- 
bility. Although some knowledge of calculus is assumed, 
the book is not based on a pure mathematical approach 
to statistics. Numerous problems and illustrations, as well 
as such modern techniques as non-parametric tests, are 
included. Coming April 1959 


THE THEORY OF GROUPS 

By Marshall Hall, Jr., Ohio State University 

Presenting the fundamentals of the theory of groups, this 
book includes such noteworthy features as: original ma- 
terial on the Burnside problem and on projective planes, 
a section on the theory of group representation, a lattice 
theoretical approach to properties of sub-group series, 
and a highly interesting interpretation of free groups and 
free products. Coming January 1959 


ELEMENTARY MATRIX ALGEBRA 

By Franz E. Hohn, University of Illinois 

Devoid of unnecessary mathematical formalism, this book 
presents those aspects of elementary matrix algebra which 
are most commonly applied in the physical and social 
sciences. The fundamental properties of determinants are 
fully treated. Published November 1958, $7.50 


The Macmillan 


60 FIFTH AVENUE, NEW YORK 11, N. Y. 


Please mention the Journal of the Amenican Statistica, Assoctation in writing advertisers 


























IMPORTANT HOLT TEXTS IN STATISTICS 


ELEMENTARY STATISTICAL METHODS, Rev. 
Helen Walker and Joseph Lev 


STATISTICAL METHODS, 3rd Ed. 
Frederick C. Mills 


STATISTICAL INFERENCE 
Helen M. Walker and Joseph Lev 
TO BE READY IN JANUARY: 


A BASIC COURSE IN 
SOCIOLOGICAL STATISTICS 
Morris Zelditch, Jr. 


HENRY HOLT AND CQO. 


383 Madison Avenue, New York 17, N. Y. 











clear and precise 


METHODS OF STATISTICAL ANALYSIS 
IN ECONOMICS AND BUSINESS 


E, E. LEWIS, Howard University 


$6.25 


Boston 7 

New York 16 

Chicago 16 

Dallas 1 
HOUGHTON MIFFLI MPANY 

Palo Alto 
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The IDEA Gectore > 


Behind 
MARKET RESEARCH 
TABULATING 


ECONOMICAL PROCESSING 


STATISTICAL service on market research 
tabulating begins long before a 
button is pushed. 


You get preliminary assistance in resolving 
your ideas . . . in translating sound thinking 
into well-planned questionnaires for the most 
practical and economical processing. 


There is always a best way to handle any 

assignment and STATISTICAL can help you apply 
it through long experience in methods 

and procedures. 

The same careful approach is used in processing 
data to assure highest quality in nadieieat irformation. 
Strict controls are maintained every step of the way 
from editing and coding to finished report. 


And this professional service is available to you 
days, nights, week-ends—any time you it. 


Write for details today 








STATISTICAL | |  conecsi otices: 


TABULATING CORPORATION 53 West Jackson Blvd. 


Established 1933 - Michael R. Notaro, President Chicago 4, Illinois 


TABULATING - CALCULATING + TYPING Phone: HArrison 7-4500 
TEMPORARY OFFICE PERSONNEL 




















Chicago * NewYork ¢ St.Louis * Newark ¢* Cleveland * Los Angeles 
Kansas City °* Milwaukee °¢ San Francisco 
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A complete 
tabulating service 
Billing 

Sales Analysis 

Payrolls 

Pension Planning 


Market Research 


JOHN FELIX 
ASSOCIATES 


i N c °o R P ° R A T € o 


3 EAST S4TH STREET* NEW YORK 22, NEW YORK 


PLaza 1-2050 
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Announcing the Vew joumal 
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